Skip to main content

You want your LLM to read an entire novel? Well you can now

 


A few days ago I wrote about the leaked letter from Google 'Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.'

Two days later and here is the proof of exactly that: 

MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch.'

As it turns out, the full text of The Great Gatsby weighs in at just under 68k tokens. So, naturally, we had StoryWriter read The Great Gatsby and generate an epilogue. .... StoryWriter took in The Great Gatsby in about 20 seconds (about 150k words-per-minute). Due to the long sequence length, its “typing” speed is slower than our other MPT-7B models, about 105 words-per-minute. 

This is somewhat of a game changer, a model that handle large amounts of text input, whole papers worth, that's been trained on 1T tokens, which has cost a small fraction of the amount that LLMs training has previously cost. I haven't got the hardware capable of running the model, but from the few accounts I've come across so far, it's a very capable model. 

Comments

Popular posts from this blog

The Whispers in the Machine: Why Prompt Injection Remains a Persistent Threat to LLMs

 Large Language Models (LLMs) are rapidly transforming how we interact with technology, offering incredible potential for tasks ranging from content creation to complex analysis. However, as these powerful tools become more integrated into our lives, so too do the novel security challenges they present. Among these, prompt injection attacks stand out as a particularly persistent and evolving threat. These attacks, as one recent paper (Safety at Scale: A Comprehensive Survey of Large Model Safety https://arxiv.org/abs/2502.05206) highlights, involve subtly manipulating LLMs to deviate from their intended purpose, and the methods are becoming increasingly sophisticated. At its core, a prompt injection attack involves embedding a malicious instruction within an otherwise normal request, tricking the LLM into producing unintended – and potentially harmful – outputs. Think of it as slipping a secret, contradictory instruction into a seemingly harmless conversation. What makes prompt inj...

The Future of Work in the Age of AGI: Opportunities, Challenges, and Resistance

 In recent years, the rapid advancement of artificial intelligence (AI) has sparked intense debate about the future of work. As we edge closer to the development of artificial general intelligence (AGI), these discussions have taken on a new urgency. This post explores various perspectives on employment in a post-AGI world, including the views of those who may resist such changes. It follows on from others I've written on the impacts of these technologies. The Potential for Widespread Job Displacement Avital Balwit, an employee at Anthropic, argues in her article " My Last Five Years of Work " that AGI is likely to cause significant job displacement across various sectors, including knowledge-based professions. This aligns with research by Korinek (2024), which suggests that the transition to AGI could trigger a race between automation and capital accumulation, potentially leading to a collapse in wages for many workers. Emerging Opportunities and Challenges Despite the ...

Podcast Soon Notice

I've been invited to make a podcast around the themes and ideas presented in this blog. More details will be announced soon. This is also your opportunity to be involved in the debate. If you have a response to any of the blog posts posted here, or consider an important issue in the debate around AGI is not being discussed, then please get in touch via the comments.  I look forward to hearing from you.