Issue 22: Meta Llama 3 405B Now Serverless & Phi-3.5 Goes Llama! - August 26, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week in AI brings us serverless access to Meta Llama 3 405B, Microsoft's Phi-3.5 embracing the Llama architecture, and groundbreaking research in long-form content generation with AgentWrite and the LongWriter-6k dataset.

News

Meta Llama 3 405B Deployed on Google Cloud Vertex AI & Hugging Face x NVIDIA NIM API!

Exciting news for those looking to harness the power of Meta Llama 3 405B! You can now deploy it on Google Cloud Vertex AI, giving you full control and GPT-4 level capabilities in-house. Learn more in the blog post about deploying Llama 3 405B on Vertex AI. Alternatively, leverage the https://huggingface.co/blog/inference-dgx-cloud, offering a pay-as-you-go pricing model and accessibility for all Enterprise Hub organizations. Check out the model card for the Hugging Face x NVIDIA NIM API.

Ultra-Long Outputs with LongWriter-6k

LongWriter-6k is the latest dataset that enables LLMs to generate outputs ranging from 2,000 to 32,000 words. Powered by the AgentWrite pipeline, this dataset helps break down and tackle long writing tasks more effectively. If you're working on projects requiring extensive content generation, this dataset is a must-explore.

Phi-3.5 Models Now Llama-Compatible

Microsoft's new Phi-3.5 models can now be seamlessly converted to the Llama architecture without any performance loss. With Phi-3.5's 4B parameter size, matching Llama 3.1 8B, this update is perfect for those already using Llama-optimized tools and scripts.

Boost Multi-GPU Fine-Tuning with Liger Kernels

The Liger Kernels from LinkedIn promise a 20% boost in multi-GPU training throughput and a 60% reduction in memory usage. These new kernels are designed for efficient fine-tuning of models like Llama 3 and Mistral, enabling faster training with larger batch sizes.

Tackling Small Talk with Everyday Conversations Dataset

The Everyday Conversations dataset by Hugging Face aims to improve LLMs' responses to basic greetings and small talk. Generated using Llama-3.1-70B-Instruct, this dataset includes 2.2k multi-turn conversations across a variety of everyday topics.

Llama 3.1 8B Storm: The Best Fine-Tune Yet?

Llama 3.1 8B Storm might be the most advanced fine-tune of Llama 3.1 8B so far, outperforming the original across multiple benchmarks. Through innovative data curation and selective fine-tuning, this model is pushing the boundaries of what's possible with smaller LLMs.

Research

New Advances in RLHF with CLAIR and APO

The latest research introduces CLAIR and APO, two methods that significantly improve Llama 3.1 8B's performance on complex tasks. These techniques offer refined data preparation and preference optimization, leading to a 7.45% boost on MixEval Hard.

Exploring Heterogeneous Mixture of Experts

The HMoE approach introduces a novel method of using experts of varying sizes to handle different token complexities in language modeling. By activating fewer parameters more efficiently, HMoE outperforms conventional MoE models on multiple benchmarks.

The Pruning & Distillation Debate Continues

NVIDIA's Mistral Nemo Minitron 8B reignites the debate over pruning and distillation versus training smaller models from scratch. This distilled model outperforms its larger counterparts, making a strong case for efficient model deployment.

General

AI-Powered Rogue-Like Game: Everchanging Quest

Everchanging Quest is an AI-powered rogue-like game that uses LLMs to dynamically generate maps, dungeons, and quests. Currently powered by Google Deepmind Gemini 1.5 Pro, this game showcases how LLMs can be used in creative and entertainment fields.

Celebrating 5 Million Users on Hugging Face

Hugging Face has hit a major milestoneโ€”5 million users! This growing community of AI builders, developers, and data scientists is pushing the boundaries of open AI. Next stop: 100 million!

Google Launches Serverless GPUs on Cloud Run

Google has made a groundbreaking move by launching serverless GPUs for containers and functions on Google Cloud Run, supporting NVIDIA L4 GPUs. This makes AI deployments more scalable and cost-effective, without the overhead of server management.


I hope you enjoyed this newsletter. ๐Ÿค— If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week ๐Ÿ‘‹๐Ÿป๐Ÿ‘‹๐Ÿป