Issue 25: OpenAI's o1 Models Usher in a New Era of Reasoning - September 16, 2024
This week's AI Insights delves into OpenAI's groundbreaking o1 models, exploring advancements in reasoning, reflection tuning, and exciting new models and tools from Google, Arcee, and Hugging Face.
News
OpenAI's o1 Models: A New Era of Reasoning?
OpenAI has unveiled its latest models, o1-preview and o1-mini, designed for enhanced reasoning capabilities, as detailed in their blog post (https://openai.com/index/learning-to-reason-with-llms/). These models utilize chain-of-thought reasoning, potentially achieved through reinforcement learning, and boast longer context lengths for deeper analysis. Simon Willison provides a comprehensive analysis of these new models and their implications (https://simonwillison.net/2024/Sep/12/openai-o1/). Interestingly, the "reasoning output tokens" are hidden from the user but still contribute to billing.
SuperNova: Distilled Llama 3.1 Shines Bright
Arcee has released SuperNova, a distilled version of Llama 3.1, as explained in their blog (https://blog.arcee.ai/arcee-supernova-training-pipeline-and-model-composition/). Leveraging offline knowledge distillation, RLHF, and model merging, SuperNova surpasses Meta Llama 3.1 70B instruct across benchmarks. It even works with the <thinking>
prompt of Reflection. SuperNova 70B is accessible via API, while the 8B version is available on Hugging Face (https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite). Notably, SuperNova is currently the top-performing open LLM on IFEval.
DataGemma: Leveraging Data Commons for Enhanced Accuracy
Google has introduced two new Gemma 2 models optimized for Data Commons (DC): DataGemma RAG and DataGemma RIG, discussed in their blog post (https://blog.google/technology/ai/google-datagemma-ai-llm/). These models incorporate reliable public statistical data into their responses, significantly improving factual accuracy. DataGemma RAG excels at retrieving information directly from tables, while DataGemma RIG seamlessly integrates DC queries into its reasoning process. Both models are available on Hugging Face under the gemma license (DataGemma RAG: https://huggingface.co/google/datagemma-rag-27b-it; DataGemma RIG: https://huggingface.co/google/datagemma-rig-27b-it).
Piiranha-v1: Open Model for PII Detection
Piiranha-v1, a 280M encoder model, offers robust PII detection capabilities and is available on Hugging Face (https://huggingface.co/iiiorg/piiranha-v1-detect-personal-information). It supports six languages and boasts near-perfect detection accuracy across 17 PII types. Piiranha-v1 achieves a 98.27% PII token detection rate and a 99.44% overall classification accuracy. It's released under the MIT license, making it a valuable tool for data privacy and security.
GOT: Revolutionizing OCR with Generative AI
GOT (General OCR Theory) is a 580M end-to-end OCR-2.0 model available on Hugging Face (https://huggingface.co/ucaslcl/GOT-OCR2_0). It uses a vision encoder to convert images into tokens and a decoder to generate OCR outputs in various formats. GOT excels in handling complex layouts, including formulas and geometric shapes. Its GitHub repository provides further details (https://github.com/Ucas-HaoranWei/GOT-OCR2.0/).
Fish Speech V1.4: Multilingual Text-to-Speech
Fish Speech V1.4, a multilingual text-to-speech model, is now available on Hugging Face (https://huggingface.co/fishaudio/fish-speech-1.4). Trained on 700k hours of audio data, it supports various languages, including English, Chinese, German, and French. You can experience its capabilities through the demo (https://huggingface.co/spaces/fishaudio/fish-speech-1). The code and weights are released under the CC BY-NC-SA 4.0 License.
MagPie: Synthetic Datasets with Self-Reflection
MagPie offers an innovative approach to creating synthetic datasets. By prompting LLMs with "empty" user inputs and incorporating self-reflection tags, we can generate rich and nuanced datasets for training. The resulting dataset, along with the code, is available on Hugging Face (https://huggingface.co/datasets/thesven/Reflective-MAGLLAMA-v0.1).
Hugging Face Hub's New SQL Editor
The Hugging Face Hub now features an in-browser SQL editor, documented in their documentation (https://huggingface.co/docs/datasets/sql). This tool allows you to query, filter, and export any dataset on Hugging Face using SQL and the DuckDB WASM runtime. It's a powerful addition for quick dataset inspection and analysis. If you're unfamiliar with SQL, you can leverage Hugging Chat to generate queries from natural language.
Apple iPhone 16: On-Device AI and Prompts
The newly released Apple iPhone 16 emphasizes on-device AI and Apple Intelligence. Notably, the prompts used by the Apple Intelligence Adapters for features like priority notifications and summarization are quite sophisticated. These prompts and adapters can be found within the on-device Apple Intelligence Model.
AWS Launches p5e Instances with NVIDIA H200
Amazon Web Services (AWS) has launched p5e instances featuring NVIDIA H200 GPUs, as announced in their blog post (https://aws.amazon.com/de/blogs/machine-learning/amazon-ec2-p5e-instances-are-generally-available/). These instances are available through Amazon EC2 Capacity Blocks for ML, allowing users to reserve GPU capacity for AI workloads.
Research
Exploring the Inner Workings of o1: Quiet-STaR and Reasoning
Could the Quiet-STaR paper from Stanford (https://huggingface.co/papers/2403.09629) offer clues to o1's reasoning process? Quiet-STaR encourages language models to generate "thoughts" for each token to refine their predictions. This method, involving parallel rationale generation, mixing predictions, and rationale optimization, has shown promising results in enhancing reasoning tasks. While not a definitive explanation of o1's mechanics, it provides valuable insights into potential approaches to improve LLM reasoning.
Reflection Tuning: Enhancing LLMs with Introspection
Reflection tuning is gaining traction as a method to enhance the capabilities of language models. A recent paper from ACL 24 (https://huggingface.co/papers/2402.10110) introduces Selective Reflection-Tuning. This technique uses a teacher LLM's reflection and introspection, along with student data selection, to refine instruction data automatically. By incorporating "thinking" tags into the reflection process and using Instruction-Following Difficulty (IFD) for data selection, we might be able to further enhance the effectiveness of reflection tuning.
General
Cloud AI Tuesday #3: Deploying Embeddings with Hugging Face TEI
Cloud AI Tuesday #3 focuses on deploying embeddings models with Hugging Face TEI for fast inference on Google Cloud. Text Embeddings Inference (TEI) is a solution for deploying open text embedding and ranking models efficiently. Examples for deploying models on Vertex AI and GKE are available on GitHub (Vertex AI: https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb; GKE: https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-deployment).
Cost Considerations: Managed Services vs. VPS
When evaluating managed services versus a less expensive VPS, it's crucial to consider factors beyond the initial cost. Availability, reliability, customer experience, time to market, monitoring capabilities, and hidden costs associated with managing a VPS should all be factored into the decision. While a VPS might seem attractive upfront, potential limitations on growth and revenue should be carefully weighed against the benefits of a managed service.
I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.
See you next week 👋🏻👋🏻