Issue 24: Reflection 70B Outshines GPT-4, FLUX on Google Cloud & & DeepSeek-V2.5 Raises the Bar - September 9, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week's AI news roundup features exciting model releases like Reflection 70B and DeepSeek-V2.5, plus insights into improving reward models and leveraging AI for software reliability. We also explore new Google Cloud documentation for Hugging Face, the release of two new code LLMs by 01.Ai, and an invitation to a Hugging Face party at the PyTorch Conference.

News

Reflection 70B: A New Benchmark in LLM Performance

Reflection 70B, an open Llama 3 model, has surpassed Anthropic Claude 3.5 Sonnet and OpenAI GPT-4o using Reflection-Tuning. This breakthrough confirms that we're not hitting a ceiling on LLM performance and underscores the importance of high-quality synthetic data in training.

DeepSeek-V2.5: Merging Strengths for Better Performance

DeepSeek has released version 2.5, combining DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724 into a powerful 236B parameter model. This merger brings improved benchmarks, native function calling, and JSON output mode, pushing the boundaries of what's possible in AI language models.

Yi-Coder: Small Models, Big Impact

01.Ai has introduced Yi-Coder, a new code LLM available in 1.5B and 9B versions. Despite its smaller size, Yi-Coder outperforms larger models like CodeQwen1.5 7B and rivals DeepSeek-Coder 33B, proving that efficiency can sometimes trump sheer size in AI development.

Research

Critique-out-Loud: A Novel Approach to Reward Models

Researchers have developed "Critique-out-Loud Reward Models," a new technique that generates explicit critiques before assigning rewards. This innovative approach has shown impressive performance improvements of up to 5.84% on Reward Bench, potentially revolutionizing how we train and refine AI models.

Meta's AI-Powered Incident Response

Meta has created an AI assistant powered by Llama to streamline incident response and investigation processes in software systems. Achieving 42% accuracy in identifying root causes, this system demonstrates the potential of AI in enhancing software reliability and maintenance.

General

Google Cloud and Hugging Face Join Forces

Google Cloud has launched comprehensive documentation for using Hugging Face on their platform. This collaboration opens up exciting possibilities for AI development on Google Cloud, covering everything from fine-tuning LLMs on Vertex AI to deploying models on GKE.

FLUX: AI Image Generation Comes to Google Cloud

Black Forrest Labs' FLUX, a state-of-the-art text-to-image model, can now be easily deployed on Google Vertex AI. Outperforming Stable Diffusion 3 and Midjourney v6.0, FLUX offers both open-weights and commercial options, making advanced AI image generation more accessible than ever.

Anthropic's Meta Prompt: Optimizing AI Interactions

Anthropic has open-sourced their "meta" prompt for optimizing Claude prompts. This tool can be deployed locally, allowing developers to fine-tune their prompts for better AI interactions, even with services like AWS Bedrock.

Hugging Face Party at PyTorch Conference

If you're attending the PyTorch Conference in San Francisco, don't miss the Hugging Face party on September 19th. It's a great opportunity to network with team members and the community while enjoying free food and drinks.


I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻