Issue 28: OpenAI DevDay, New Whisper Model, and LLM Price keeps dropping - October 7, 2024
This week's AI newsletter dives into the latest news from OpenAI's DevDay, including a new Realtime API and vision model fine-tuning, explores advancements in LLM reasoning and reward hacking mitigation, and covers new model releases like FLUX1.1 [pro] and Whisper V3 Turbo.
News
OpenAI DevDay 2024: A Plethora of New Features
OpenAI's DevDay 2024 unveiled a host of [new features]( swyx and Simon Willson shared updates on social media), including a real-time voice API enabling interactive applications, vision model fine-tuning capabilities, cost-saving prompt caching, model distillation tools, and enhanced structured output generation.
FLUX1.1 [pro]: Faster and Higher-Quality Image Generation
Bfl released FLUX1.1 [pro], a significantly faster and improved text-to-image model boasting a 6x speed increase and enhanced image quality, achieving top scores on artificialanalysis.ai. It's also available via platforms like together.ai and Replicate.
New Whisper Model: V3 Turbo
OpenAI quietly released a new Whisper V3 Turbo model, an optimized version offering 8x faster transcription speed with minimal accuracy degradation compared to the large-v3 model.
LLM Pricing Updates
The LLM pricing landscape continues to shift, with updates from OpenAI, Google Deepmind, Mistral AI, Cohere, and Cloudflare, impacting models like Gemini Pro, Cohere Command, Mistral Large, and Llama 3.1.
Hugging Face on Google Cloud Documentation Expanded
The Hugging Face on Google Cloud documentation received a significant update, now featuring 16 hands-on examples covering fine-tuning, deployment, and various AI tasks using platforms like Vertex AI and GKE.
Research
Boosting LLM Reasoning with Prompt Engineering
Research suggests that combining dynamic chain of thought prompting, reflection, and verbal reinforcement can significantly boost the reasoning performance of smaller LLMs. Access the prompt itself and the related GitHub repository.
Challenging LLM Math Abilities: The Reasoning Gap
A new paper explores the limitations of LLMs in multi-hop mathematical reasoning, revealing a "reasoning gap" between their performance on individual problems and chained problems, investigating various models like the Gemini series and GPT-4 variants.
Mitigating Reward Hacking with CGPO
Meta introduces Constrained Generative Policy Optimization (CGPO), a novel RLHF method utilizing a Mixture of Judges (MoJ) to prevent reward hacking and improve performance, and the related TRL pull request offers further implementation details.
FRAMES: A New Benchmark for RAG
Google released FRAMES, a comprehensive evaluation dataset designed to assess Retrieval-Augmented Generation (RAG) applications on factuality, retrieval accuracy, and reasoning skills, focusing on challenging multi-hop questions.
General
OpenAI's Realtime API: Building Interactive Applications
OpenAI's new Realtime API offers developers the ability to build interactive applications with real-time text and audio processing using WebSockets and function calling. (No link provided)
Llama 3.2 Vision: A Deep Dive into Multimodal Capabilities
An analysis of Meta's Llama 3.2 Vision explores its multimodal capabilities, comparing its performance to GPT-4o in various tasks, including image understanding, medical image analysis, and financial chart analysis.
OpenAI's "Meta" Prompt Potentially Leaked
A "meta" prompt for optimizing GPT prompts, similar to Anthropic's prompt generator, might have been discovered shortly after OpenAI launched a related playground feature.
Fine-tuning Multimodal LLMs with Hugging Face TRL
A practical guide demonstrates how to fine-tune open multimodal LLMs and VLMs like Qwen 2 7B VL using Hugging Face TRL, with the accompanying code example providing practical implementation details.
Single-Agent vs. Multi-Agent Systems for AI Agents
A blog post discusses the trade-offs between single-agent and multi-agent systems for building LLM-based AI agents, highlighting the advantages of well-designed single-agent systems.
I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.
See you next week 👋🏻👋🏻