Issue 27: Llama 3.2 Goes Multimodal, Google Speeds Up Gemini, and AI Designs Its Own Chips - September 30, 2024
This week's AI news roundup covers exciting developments in multimodal models with Llama 3.2, advancements in LLM evaluation and training techniques, AI-powered chip design, and the release of powerful open-source code models.
News
Llama 3.2: Multimodal, Mobile, and EU Restricted
Meta releases Llama 3.2, featuring multimodal (text+image) capabilities in Llama Vision and smaller, efficient models for on-device deployment detailed on the Meta AI blog. However, access to the multimodal model is currently restricted in the EU. Explore the new features and access the models on their Hugging Face collection page.
Molmo: An Open-Source Multimodal Powerhouse
Allen AI introduces Molmo, an open-source, Apache 2.0 licensed multimodal model outperforming Llama 3.2 and other open models on several benchmarks, as explained on their official website. Molmo, available in various sizes, uses a simpler architecture and emphasizes high-quality training data. Explore the models, demo, paper, and blog post on their Hugging Face collection page.
OpenAI Releases Multilingual MMMLU Dataset
OpenAI releases the Multilingual Massive Multitask Language Understanding (MMMLU) dataset on Hugging Face. This dataset, available in 14 languages, facilitates the evaluation of AI models' general knowledge across different cultures.
Mistral's Pixtral now on Hugging Face Transformers
Mistral AI's Pixtral is now accessible through Hugging Face Transformers using the "LlavaForConditionalGeneration" class, as explained on the model card. This integration simplifies access and expands Pixtral's usability.
MixEval Update Streamlines Open LLM Evaluation
MixEval receives an update simplifying open LLM evaluation using TGI or vLLM, as described on the Mixtral repository. The update includes refreshed benchmark data, MMLU-Pro integration, and simplified MixEval-Hard sampling.
Gemini 1.5 002: Faster, Cheaper, and Better
Google releases Gemini 1.5 002, featuring significant improvements in speed, cost, and performance, as highlighted on the Google AI blog. Enhancements include price reductions, faster responses, higher rate limits, and improved performance across various benchmarks.
Research
Salesforce Explores DPO for Enhanced LLM Evaluation
Salesforce research investigates improving LLM evaluation with Direct Preference Optimization (DPO) and their SFR-Judges, detailed in their Arxiv page. These generative judges demonstrate superior performance compared to other reward models.
Meta Improves LLM Evaluators with Synthetic Data
Meta addresses the cost of preference data in LLM evaluation through iterative self-improvement and synthetic data generation described in their research paper. This method significantly boosts Llama 3 70B's performance. Explore their work, including models and code, on their GitHub repository.
Google Deepmind Improves Mathematical Reasoning with OmegaPRM
Google Deepmind introduces OmegaPRM, a novel technique to enhance mathematical reasoning in LLMs through automated process supervision, explained in their research paper. This Monte Carlo Tree Search-based approach improves multi-hop reasoning without human annotation.
General
Anthropic Advocates for Context Data Augmentation in RAG
Anthropic highlights the benefits of context data augmentation, or "contextual retrieval," to improve RAG application performance, as discussed in their blog post. This technique significantly reduces retrieval failure rates by enriching text chunks with additional context.
Google Deepmind Uses AI for Chip Design
Google Deepmind employs AI to optimize chip design with AlphaChip, a reinforcement learning system, as described in their blog post. This approach leads to more efficient TPUs.
Evaluating LLMs with Gemini on Google Cloud
Learn how to evaluate open LLMs using Google Cloud's AI Evaluation Service with Gemini 1.5 Pro as the judge via a helpful guide on Phil Schmid's blog. The guide covers deployment, evaluation metrics, and prompt comparison, with accompanying code on the GitHub repository.
Priompt: A React-Based Prompt Design Library
The team behind Cursor AI releases Priompt, a React-based library simplifying prompt design for LLMs as explained on their GitHub repository. This library provides a structured, JSX-driven approach to crafting effective prompts.
I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.
See you next week 👋🏻👋🏻