Issue 21: Apple's AI Strategy, Gemma 2 Release, and Serverless Inference Innovations - August 5, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week's highlights include Apple's AI strategy reveal, Google's Gemma 2 release, and Hugging Face's serverless inference service, showcasing significant advancements in AI accessibility and performance.

News

Apple Unveils AI Strategy and Foundation Model Details

Apple has disclosed its AI strategy, emphasizing open-source collaboration and scientific research in developing Apple Intelligence. The tech giant's approach includes a 2.7B on-device model for iPhones and a larger server-based model for private cloud computing, both built on dense decoder architecture similar to Llama.

Google Releases Gemma 2 2B for On-Device AI

Google has introduced Gemma 2 2B, an LLM optimized for on-device and edge inference. Trained on 2 trillion tokens using knowledge distillation, this model requires less than 1GB of memory in int4 quantization and offers strong performance for its size with 56.7% on IFEval.

Hugging Face and NVIDIA Launch Serverless Inference Service

Hugging Face and NVIDIA have collaborated to launch Inference-as-a-Service, a new feature for Enterprise Hub organizations. Powered by NVIDIA DGX Cloud and NIMs, this service offers serverless compute for the latest open LLMs, including Llama 3.1 70B and Mixtral 8x22B.

Writer Introduces Domain-Specific LLMs

Writer has released two domain-specific models: Palmyra-Med and Palmyra-Fin, outperforming general-purpose models like GPT-4 in their respective domains. These 70 billion parameter models, available on Hugging Face, demonstrate the potential of specialized LLMs in medical and financial fields.

New Synthetic Dataset from Llama 3.1 405B Released

The MagPie-Ultra dataset has been released, featuring 50,000 synthetic instruction pairs generated using Llama 3.1 405B-Instruct FP8. This unfiltered dataset includes quality scores, embeddings, and safety scores, providing valuable training data for future models.

Research

Misconceptions About Llama 3.1 "Distillation" Clarified

Meta's recent release of Llama 3.1 has sparked discussions about the use of "distillation" in improving smaller versions. This process is more accurately described as imitation learning or training on synthetic-generated data from larger models. To address potential train-inference mismatches, researchers suggest using online/on-policy Knowledge Distillation, as demonstrated in Google's Gemma 2 models.

General

New Vector Database for On-Device Applications

sqlite-vec, a new SQLite extension, brings vector search capabilities to on-device applications. This C-based, dependency-free tool supports Matryoshka embedding slicing, binary quantization, and various distance calculations, making it ideal for on-device RAG applications.

Terraform Module Simplifies LLM Deployment on Amazon SageMaker

A new Terraform module streamlines the deployment of open LLMs from Hugging Face to Amazon SageMaker real-time endpoints. This tool simplifies the process of moving AI applications from notebooks to production, handling IAM roles, SageMaker Model, Endpoint Configuration, and Autoscaling.

Hugging Face Collection for Function Calling Datasets

A new Hugging Face collection features 38 datasets designed to teach LLMs function calling. This resource opens up possibilities for enhancing LLMs' ability to interact with external tools and APIs.

I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻