Issue 16: Google Releases Gemma 2, OpenAI Introduces CriticGPT, and Open LLM Leaderboard 2 Launches - June 30, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week's newsletter covers major model releases like Google's Gemma 2 and OpenAI's CriticGPT, alongside the launch of the Open LLM Leaderboard 2 and other innovative AI tools and research.

News

Google Unveils Gemma 2

Google has released Gemma 2, the next iteration of its open-source language model, available in 9B and 27B parameter versions. This new release introduces architectural improvements like sliding window attention and logit soft-capping, with the 27B version approaching the performance of Meta's Llama 3 70B.

Open LLM Leaderboard 2 Launches

The Open LLM Leaderboard has received a major update, introducing new benchmarks such as MMLU-Pro and GPQA, along with an improved ranking system. This refresh aims to provide fairer and more transparent comparisons between language models, with Qwen2 72B Instruct currently leading the pack.

Arcee Releases Qwen2-Based Custom Model

Arcee has introduced Arcee-Spark, a custom model based on Qwen2 7B that outperforms Llama 3 8B instruct on AGIEval and OpenAI GPT-3.5 on MT-Bench. This model, available under an Apache 2.0 license, showcases the potential of fine-tuning and merging techniques in improving open-source LLMs.

Research

OpenAI's LLM Critics: CriticGPT

OpenAI's latest paper introduces CriticGPT, an autoregressive language model trained to critique question-answer pairs, reminiscent of Anthropic's Constitutional AI method. This approach uses RLHF to improve data quality and outperform human experts in identifying errors in ChatGPT data.

Gemini and Gemma Revive Knowledge Distillation

Google's Gemini and Gemma models are bringing back "online" Knowledge Distillation for language models, as detailed in a recent research paper. This technique allows a smaller model to learn from a larger one during training, potentially improving performance and efficiency.

PlanRAG: Enhancing Complex Data Analysis

PlanRAG introduces a novel approach to data analysis with LLMs, generating an initial plan followed by iterative data retrieval and decision-making. This method shows promising results, outperforming Iterative RAG by up to 15.8% in some scenarios and achieving 64.5% accuracy in Locating and 45.0% in Building scenarios.

FineWeb: Building Better Web-Based Datasets

The FineWeb paper details the creation of FineWeb 15T and FineWeb Edu, offering insights into text extraction, filtering, and deduplication for web-based datasets. This comprehensive study provides 17 pages of ablations and technical details on improving dataset quality for language model training.

General

MixEval Fork: Streamlined Model Evaluation

An improved version of MixEval has been released on GitHub, offering easier evaluation of new models with a single command and support for vLLM. This tool provides a 96% correlation to LMSYS Chatbot Arena at a fraction of the cost and time, making it an invaluable resource for model developers.

Custom Embedding Models on AWS SageMaker

A new guide demonstrates how to train and deploy open embedding models on AWS SageMaker, focusing on financial RAG using NVIDIA SEC Filing data. This approach leverages Sentence Transformers 3 and Text Embedding Inference for efficient training and blazing-fast inference, with training taking only about 10 minutes on an ml.g5.xlarge instance.

I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻