Issue 9: DeepSeek v2 and the Buzz Dataset Revolutionize AI Training - May 11, 2024
This week, explore the transformative DeepSeek v2, dive into the expansive Buzz dataset, and discover groundbreaking research in LLM evaluation and multi-token prediction.
Issue 9: DeepSeek v2 and the Buzz Dataset Revolutionize AI Training
This week, explore the transformative DeepSeek v2, dive into the expansive Buzz dataset, and discover groundbreaking research in LLM evaluation and multi-token prediction.
News
DeepSeek v2: A New Era in AI
Discover the innovations of DeepSeek v2 in their latest paper, featuring a 236B parameter model with a 128k context window and 21B active parameters. The architecture employs Multi-head Latent Attention (MLA) and Mixture-of-Experts (MoE), enhancing efficiency by reducing key-value cache demands during inference. Learn more about the model.
Buzz Dataset: The Future of Instruction Data
Alignment Labs AI presents Buzz, an extensive instruction dataset with 3.13 million rows and 85 million conversations. This dataset, curated from 435 sources, supports configurations for SFT, RLHF, and Select Stack, making it a cornerstone for advanced AI training. Explore Buzz here.
Granite Code Models: A Leap in Open Code LLMs
IBM's Granite Code models, ranging from 3B to 34B parameters, excel in benchmarks and support 116 programming languages. These models are based on the Llama architecture and are available on Hugging Face under Apache 2.0. Read the paper here.
Whisper with Speaker Diarization
Enhance your transcription capabilities with our optimized Whisper model combined with speaker diarization. This solution uses Flash Attention and speculative decoding for ultra-fast inference. Check out the implementation.
Research
PROMETHEUS 2: Elevating LLM Evaluation
Kaist.AI introduces PROMETHEUS 2, an open LLM designed for robust evaluation, correlating highly with human and GPT-4 judgments. This model supports both pairwise and grading scoring, setting a new standard for LLM evaluation. Explore the model here.
Multi-Token Prediction in LLMs
Meta's latest paper on multi-token prediction explores training models to predict several future tokens at once, enhancing learning speed and accuracy. This innovative strategy shows that LLMs can achieve better outcomes without additional training time. Read the study here.
General
Fine-Tuning Llama 3 on Amazon SageMaker
Master the deployment and fine-tuning of Llama 3 on Amazon SageMaker with PyTorch FSDP and Q-Lora. This comprehensive guide provides steps to enhance model efficiency and reduce memory usage. Read the full guide here.
I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.
See you next week 👋🏻👋🏻