A few days ago I wrote about the leaked letter from Google 'Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.'
Two days later and here is the proof of exactly that:
MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch.'
As it turns out, the full text of The Great Gatsby weighs in at just under 68k tokens. So, naturally, we had StoryWriter read The Great Gatsby and generate an epilogue. .... StoryWriter took in The Great Gatsby in about 20 seconds (about 150k words-per-minute). Due to the long sequence length, its “typing” speed is slower than our other MPT-7B models, about 105 words-per-minute.
This is somewhat of a game changer, a model that handle large amounts of text input, whole papers worth, that's been trained on 1T tokens, which has cost a small fraction of the amount that LLMs training has previously cost. I haven't got the hardware capable of running the model, but from the few accounts I've come across so far, it's a very capable model.
Comments
Post a Comment