How far can you get with a single GPU in just one day? - January 3, 2023

Companies continue to adopt and rely on language models in their operations,e.g., BERT, RoBERTa, and it's important to consider not only the performance of these models, but also their efficiency and cost-effectiveness. With this in mind Jonas Geiping, Tom Goldstein published a paper, “Cramming: Training a Language Model on a Single GPU in One Day” answering the question, “How far can you get with a single GPU in just one day?”

The findings and applied methods can have an important impact on companies that may be using out-dated public checkpoints or could benefit from using transformers in their domain if given the opportunity. The authors also made the code available on GitHub.

News & Announcements 📣

Before 2022 was over, the Hugging Face Transformers team shared an insight about the development of transformers in 2022. The Transformers library grew to 300 000 daily pip installs and 1 000 000 weekly active users in 2022. 🤯🤯

François Chollet also shared insights on the usage and new features of Keras in 2022 and what will come in 2023 in a Twitter Thread.

Suppose you were testing the new Pytorch 2.0 features with PyTorch-nightly during the Christmas break. In that case, you should re-install it, since the PyTorch team reported that the nightly version was compromised between December 25, 2022, and December 30, 2022.

Tutorials & Demos 📝

I published a blog post on will learn how to fine-tune google/flan-t5-base for chat & dialogue summarization using Hugging Face Transformers.

Deedy shows the tricks from the latest ML research to help you get better results with ChatGPT, including Chain-of-thought reasoning (CoT) & Self-consistency.

Steven Warren wrote a blog post on how to create & deploy a Stable Diffusion Discord Bot on AWS with an event-driven & scalable architecture.

Reads & Papers 📚

Google & Deepmind published a paper on a new large language model aligned to the medical domain to generate safe and helpful answers, achieving SOTA performance on MedQA.

Muse: Text-To-Image Generation using Transformers by Google. Muse is a new, text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models.

I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻