Issue 10: GPT-4o Multimodal Release and LoRA Insights - May 19, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week’s highlights include the release of GPT-4o, new insights on LoRA, and the Apache 2.0 release of Yi 1.5.


GPT-4o Multimodal Release
OpenAI has released GPT-4o, a new multimodal LLM that processes text, audio, and vision in real-time. It boasts 88.7% on MMLU and up to 50% cost reduction compared to GPT-4 Turbo. Despite these advancements, open-source AI remains strong and competitive.

Falcon 2 Unveiled
TII has introduced Falcon 2 11B, the first model of the Falcon 2 family. This dense decoder model is trained on 5.5 trillion tokens, supports multiple languages, and outperforms Llama 3 8B on TruthfulQA and GSM8K. Available for commercial use on Hugging Face.

Apache 2.0 Release of Yi 1.5
We are excited to announce the Apache 2.0 release of Yi 1.5, a continuously pre-trained model on 500B tokens. Available in 6B, 9B, and 34B sizes, Yi 1.5 excels in coding, reasoning, and instruction-following tasks, almost matching Meta Llama 3 70B on benchmarks.

MMLU-Pro Benchmark Released
TIGER-Lab on Hugging Face has released MMLU-Pro, a challenging multi-task language understanding dataset with 12K complex questions. GPT-4o’s performance drops by 17%, indicating the difficulty of this benchmark.


Effectiveness of LoRA for Fine-Tuning LLMs
The paper “LoRA Learns Less and Forgets Less” compares LoRA and full fine-tuning on programming and mathematics. LoRA, while memory-efficient, underperforms full finetuning on target domains but better maintains base model performance on other tasks.

Tips for Using LoRA and Q-LoRA
For your next LLM project, consider these tips for using LoRA and Q-LoRA. For GPUs like the A100, try a constant learning rate of 2e-4. Start with targeting "All" modules and set the rank based on memory constraints (r=16 is a good choice). Training for at least four epochs is recommended.


Building Conversational Agents with AWS
We recently demonstrated how to build your own conversational agent using Hugging Face and AWS at the AWS Summit Berlin. We showcased deploying Llama 3 on AWS Inferentia2, which offers an alternative to GPUs. Stay tuned for our upcoming blog post detailing the deployment process.

Introducing huggingface-langchain
Introducing huggingface-langchain, an open-source package that integrates Hugging Face models into LangChain. With easy installation via pip install langchain-huggingface, you can access open LLMs and embedding models for local or hosted use. Kudos to Harison, Erick Friis, Kirill KONDRATENKO, Joffrey, Andrew, and Aymeric for this fantastic integration.

Supervised Fine-Tuning Tips
For those starting a new LLM project, here are some supervised fine-tuning tips for GPUs like the A100: begin with three epochs, use a learning rate of 2e-5 with a cosine schedule and a 0.1 warmup ratio. Pack samples up to a sequence length of 2048 and try a global batch size of 256 or 512. For distributed training, both Deepspeed and FSDP work well.

I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻