Transformers can generate music! - January 31, 2023
Google introduces MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". The team behind MusicLM provide a demo page with prompts and the output of it.
News & Announcements 📣
Transformers 4.26.0 dropped last week, including new models like BLIP, EfficientFormer, GIT
DIffusers 0.12.0 is out with LoRA support for fast & mem-efficient Stable Diffusion fine-tuning and InstructPix2Pix.
Transformers Reinforcement Learning (TRL) got its second release to easily train language models with RL (e.g. RLHF).
Text-to-Audio Diffusion is there! Check out the awesome samples from the Moûsai implementation and become a DJ.
Salesforce announced BLIP-2 an improved version of their BLIP model for, e.g. Visual Question Answering or Image captioning.
Tutorials & Demos 📝
Andrej Karpathy dropped a 2 hour long awesome video on “Let's build GPT: from scratch, in code, spelled out.”
Horace helps you to learn how to reduce your GPU overhead with PyTorch 2.0.
Neural Coder - One-Click Quantize 🤗 Transformers Models
I created a short post on how to efficiently deploy FLAN-T5-XXL (11B) on a single GPU using Hugging Face Inference Endpoints and showed how to use the Hugging Face Transformers examples scripts to fine-tune or pre-train Transformers models.
Reads & Papers 📚
Attention is all you need... but how much of it do you need? H3 - a new generative language model that outperforms GPT-Neo-2.7B with only 2 attention layers!
“LangChain Chat” blog post explains how to build a chatbot agent answering questions about LangChain’s documentation using a database (vector).
GLIGEN Open-Set Grounded Text-to-Image Generation got released, proposing a new approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion. Try out the demo!
Read about “Structured Pruning for Transformer-Based Models” to optimize and accelerate your BERT models up to 25x.
CarperAI is releasing a series of diff models to predict a code diff, trained on millions of commits scraped from GitHub fine-tuned from Salesforce’s CodeGen code synthesis models.
Sayak Paul wrote about The State of Computer Vision at Hugging Face 🤗 and the 🤗 science team explained techniques behind ChatGPT: RLHF, IFT, CoT, Red teaming, and more
I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.
See you next week 👋🏻👋🏻