Issue 17: Moshi Challenges OpenAI, Compare LLM pricing and better understand long context LLMs - July 7, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week's AI landscape sees Kyutai's Moshi model challenging OpenAI's dominance, while a new interactive tool simplifies LLM pricing comparisons, and research sheds light on long-context performance of various language models.

News:

Moshi: Open Science Takes on OpenAI

Kyutai has released Moshi (demo coming soon), a real-time native multimodal foundation model that listens and speaks, rivaling OpenAI's GPT-4. This open-source marvel expresses emotions, generates audio, and thinks as it speaks, all with a mere 200ms end-to-end latency. A smaller variant even runs on consumer hardware, potentially democratizing advanced AI capabilities.

Phi-3 Gets a Boost

Microsoft has updated Phi-3 mini with significant improvements in long-context understanding, instruction following, and structured output. These enhancements, achieved through post-training improvements, boost performance across various benchmarks while maintaining the model's MIT license.

LLM Pricing Comparison Made Easy

A new interactive Hugging Face Space allows users to compare LLM pricing across various providers. This tool offers side-by-side comparisons and includes recent additions like Fireworks, Groq, Replicate, and IBM, making it easier to choose the right model for your budget and needs.

Research:

RAG Best Practices Unveiled

The paper "Searching for Best Practices in Retrieval-Augmented Generation" provides valuable insights into optimizing RAG systems. While not groundbreaking, it offers recommendations on query classification, chunking techniques, and hybrid retrieval methods to enhance RAG performance.

RLHF: Online Trumps Offline

Google DeepMind's research comparing online and offline RLHF methods reveals that online approaches outperform their offline counterparts. On-policy sampling leads to more diverse and effective data coverage, with online-trained smaller models sometimes surpassing offline-trained larger ones.

Taming Verbose Responses

Meta's LIFT-DPO and MMLAB's iLR-DPO offer solutions to verbosity in preference tuning. LIFT-DPO uses explicit length instructions, while iLR-DPO applies a length penalty during training, both showing promise in improving response quality.

General:

LLMs vs RAG: The Long Context Showdown

The Summary of a Haystack (SummHay) benchmark provides intriguing insights into LLMs' performance in long-context scenarios. Gemini 1.5 pro leads with 37-44% performance, outshining GPT-4 and Claude 3 Opus, while smaller models paired with RAG sometimes outperform larger counterparts.

Efficient Fine-Tuning for MoE Models

DeepSeek's Expert-Specialized Fine-Tuning (ESFT) is revolutionizing Mixture of Experts (MoE) models. This approach reduces memory usage by up to 90% and training time by up to 30%, maintaining ~98% of full fine-tuning performance and outperforming LoRA by up to 10%.

Claude's Hidden Thoughts

A recent discovery suggests that Anthropic's Claude 3.5 Sonnet might be suppressing parts of its thought process using §§antThinking§§ tags. This raises important questions about transparency in commercial AI systems and reminds us to approach these tools with a critical eye.

I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻