Issue 27: Llama 3.2 Goes Multimodal, Google Speeds Up Gemini, and AI Designs Its Own Chips - September 30, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week's AI news roundup covers exciting developments in multimodal models with Llama 3.2, advancements in LLM evaluation and training techniques, AI-powered chip design, and the release of powerful open-source code models.

News

Llama 3.2: Multimodal, Mobile, and EU Restricted

Meta releases Llama 3.2, featuring multimodal (text+image) capabilities in Llama Vision and smaller, efficient models for on-device deployment detailed on the Meta AI blog. However, access to the multimodal model is currently restricted in the EU. Explore the new features and access the models on their Hugging Face collection page.

Molmo: An Open-Source Multimodal Powerhouse

Allen AI introduces Molmo, an open-source, Apache 2.0 licensed multimodal model outperforming Llama 3.2 and other open models on several benchmarks, as explained on their official website. Molmo, available in various sizes, uses a simpler architecture and emphasizes high-quality training data. Explore the models, demo, paper, and blog post on their Hugging Face collection page.

OpenAI Releases Multilingual MMMLU Dataset

OpenAI releases the Multilingual Massive Multitask Language Understanding (MMMLU) dataset on Hugging Face. This dataset, available in 14 languages, facilitates the evaluation of AI models' general knowledge across different cultures.

Mistral's Pixtral now on Hugging Face Transformers

Mistral AI's Pixtral is now accessible through Hugging Face Transformers using the "LlavaForConditionalGeneration" class, as explained on the model card. This integration simplifies access and expands Pixtral's usability.

MixEval Update Streamlines Open LLM Evaluation

MixEval receives an update simplifying open LLM evaluation using TGI or vLLM, as described on the Mixtral repository. The update includes refreshed benchmark data, MMLU-Pro integration, and simplified MixEval-Hard sampling.

Gemini 1.5 002: Faster, Cheaper, and Better

Google releases Gemini 1.5 002, featuring significant improvements in speed, cost, and performance, as highlighted on the Google AI blog. Enhancements include price reductions, faster responses, higher rate limits, and improved performance across various benchmarks.

Research

Salesforce Explores DPO for Enhanced LLM Evaluation

Salesforce research investigates improving LLM evaluation with Direct Preference Optimization (DPO) and their SFR-Judges, detailed in their Arxiv page. These generative judges demonstrate superior performance compared to other reward models.

Meta Improves LLM Evaluators with Synthetic Data

Meta addresses the cost of preference data in LLM evaluation through iterative self-improvement and synthetic data generation described in their research paper. This method significantly boosts Llama 3 70B's performance. Explore their work, including models and code, on their GitHub repository.

Google Deepmind Improves Mathematical Reasoning with OmegaPRM

Google Deepmind introduces OmegaPRM, a novel technique to enhance mathematical reasoning in LLMs through automated process supervision, explained in their research paper. This Monte Carlo Tree Search-based approach improves multi-hop reasoning without human annotation.

General

Anthropic Advocates for Context Data Augmentation in RAG

Anthropic highlights the benefits of context data augmentation, or "contextual retrieval," to improve RAG application performance, as discussed in their blog post. This technique significantly reduces retrieval failure rates by enriching text chunks with additional context.

Google Deepmind Uses AI for Chip Design

Google Deepmind employs AI to optimize chip design with AlphaChip, a reinforcement learning system, as described in their blog post. This approach leads to more efficient TPUs.

Evaluating LLMs with Gemini on Google Cloud

Learn how to evaluate open LLMs using Google Cloud's AI Evaluation Service with Gemini 1.5 Pro as the judge via a helpful guide on Phil Schmid's blog. The guide covers deployment, evaluation metrics, and prompt comparison, with accompanying code on the GitHub repository.

Priompt: A React-Based Prompt Design Library

The team behind Cursor AI releases Priompt, a React-based library simplifying prompt design for LLMs as explained on their GitHub repository. This library provides a structured, JSX-driven approach to crafting effective prompts.

I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻