philschmid

ISSUE #3: Transformers can generate music!

Published on:
3 min read

Google introduces MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". The team behind MusicLM provide a demo page with prompts and the output of it.

News & Announcements 📣

Transformers 4.26.0 dropped last week, including new models like BLIP, EfficientFormer, GIT

DIffusers 0.12.0 is out with LoRA support for fast & mem-efficient Stable Diffusion fine-tuning and InstructPix2Pix.

Transformers Reinforcement Learning (TRL) got its second release to easily train language models with RL (e.g. RLHF).

Text-to-Audio Diffusion is there! Check out the awesome samples from the Moûsai implementation and become a DJ.

Salesforce announced BLIP-2 an improved version of their BLIP model for, e.g. Visual Question Answering or Image captioning.

Tutorials & Demos 📝

Andrej Karpathy dropped a 2 hour long awesome video on “Let's build GPT: from scratch, in code, spelled out.”

Horace helps you to learn how to reduce your GPU overhead with PyTorch 2.0.

Neural Coder - One-Click Quantize 🤗 Transformers Models

I created a short post on how to efficiently deploy FLAN-T5-XXL (11B) on a single GPU using Hugging Face Inference Endpoints and showed how to use the Hugging Face Transformers examples scripts to fine-tune or pre-train Transformers models.

Reads & Papers 📚

Attention is all you need... but how much of it do you need? H3 - a new generative language model that outperforms GPT-Neo-2.7B with only 2 attention layers!

“LangChain Chat” blog post explains how to build a chatbot agent answering questions about LangChain’s documentation using a database (vector).

GLIGEN Open-Set Grounded Text-to-Image Generation got released, proposing a new approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion. Try out the demo!

Read about “Structured Pruning for Transformer-Based Models” to optimize and accelerate your BERT models up to 25x.

CarperAI is releasing a series of diff models to predict a code diff, trained on millions of commits scraped from GitHub fine-tuned from Salesforce’s CodeGen code synthesis models.

Sayak Paul wrote about The State of Computer Vision at Hugging Face 🤗 and the 🤗 science team explained techniques behind ChatGPT: RLHF, IFT, CoT, Red teaming, and more


I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻

🗞️ Stay updated with bi-weekly Transformers & Cloud News and Insights delivered to your inbox