Issue 18: Google TPUs on Hugging Face, FlashAttention-3, and AgentInstruct - July 14, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week's highlights include Google TPUs arriving on Hugging Face, FlashAttention-3's impressive speed boost, and Microsoft's innovative AgentInstruct for LLM training.

News

Google TPUs Now Available on Hugging Face

Exciting news for AI enthusiasts: Google Cloud TPUs are now available on Hugging Face Spaces and Inference Endpoints. This integration allows users to build, train, and deploy Generative AI models using the power of Google TPUs. With options ranging from 16GB to 128GB TPU memory and pricing starting at just $1.38/hour, it's never been easier to harness the power of TPUs.

FlashAttention-3: Speeding Up Transformer Models

FlashAttention-3 is here, and it's turning heads with its impressive speed. This latest iteration is 1.5-2.0x faster than its predecessor, FlashAttention-2, when using FP16. It's not just about speed; FlashAttention-3 also supports FP8 while maintaining accuracy.

Q-GaLore: Train Big, Save Memory

Ever wished you could train a 7B model with just 16GB of memory? Q-GaLore makes it possible. This memory-efficient training methodology combines low-rank projection and quantization, reducing memory usage by up to 61% compared to full training. The best part? It achieves comparable performance to full-precision training.

NuminaMath 7B TIR: Solving Complex Math Problems

Meet NuminaMath 7B TIR, a task-specific LLM that outperforms most high school students in solving complex math problems. This model uses tool-integrated reasoning, combining Chain-of-Thought reasoning with Python REPLs in an agentic flow. Want to see it in action? Try out the demo on Hugging Face Spaces and experience the power of AI in mathematics.

Research

Revolutionizing LLM Training with Arena Learning

Want to create high-quality data for your LLM? Arena Learning might be the answer. This innovative method, developed by the creators of WizardLM, uses offline Pair-wise LLM Battles to generate top-notch synthetic data. The process involves collecting diverse instruction data, battling LLMs, and iteratively training models on winning outputs.

AgentInstruct: Teaching LLMs New Tricks

Microsoft's AgentInstruct is redefining how we teach LLMs new skills. This multi-agent workflow transforms raw data into high-quality instructional content, improving a 7B model by ~20% across all benchmarks. The process involves data collection, content transformation, seed instruction generation, and instruction refinement.

RLHF: Breaking Language Barriers

Can RLHF transfer to different languages? Cohere's latest research suggests it can. Their experiments show that training on one or multiple languages improves performance on unseen languages, with online RLHF methods demonstrating stronger transfer capabilities than offline methods.

General

Winning the AI Math Olympiad: Hugging Face and Numina's Secret Sauce

How did Hugging Face and Numina clinch the AI Math Olympiad 2024 Progress prize? The secret lies in their innovative approach using custom synthetic datasets and two-stage fine-tuning. They leveraged the DeepSeekMath-Base 7B model and employed Self-Consistency and Tool Integrated Reasoning for inference. Want to replicate their success? Dive into the details in their comprehensive blog post.

Simplifying LLM Evaluation for RAG Applications

Think LLM evaluation has to be complex? Think again. A new approach using additive scoring, chain-of-thought, and JSON schema is making waves. This method, inspired by G-EVAL and Self-Rewarding Language Models, uses Meta-Llama-3-70B-Instruct as the LLM Judge and a clear, small-scale scoring system. Ready to streamline your evaluation process? Learn how to implement this approach in your RAG applications by reading the detailed blog post.


I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻