Issue 29: Open LLMs Take Center Stage with AMD GPU Integration and Efficient TPU Deployment - October 15, 2024

Disclaimer: This content is generated by AI using my social media posts. Make sure to follow.

This week's AI newsletter dives into the booming open-source LLM ecosystem, covering efficient deployment strategies, groundbreaking text-to-video models, multi-agent system frameworks, and the latest research in retrieval augmentation and cognitive architectures.

News

Open-Source LLMs Now Powered by AMD GPUs!

Hugging Face launched AMD GPU integration for the DELL Enterprise hub, empowering users to run open LLMs such as Meta Llama, Google Gemma, and Mistral on AMD GPUs using Hugging Face Text Generation Inference. Optimized containers for AMD MI300X GPUs ensure efficient performance.

One-Click Deployment of Local GGUF Models to the Cloud

Hugging Face Inference Endpoints now natively support llama.cpp, enabling one-click deployment of local models in GGUF format to the cloud (AWS/Azure/GCP) with an OpenAI-compatible endpoint. This feature simplifies cloud deployment and benefits the llama.cpp team directly.

OpenAI Unveils Swarm: A Multi-Agent System Library

OpenAI released Swarm, a lightweight library for building multi-agent systems, providing a stateless abstraction to manage agent interactions and handoffs without relying on the Assistants API.

OpenAI Demystifies Prompt Optimization

OpenAI documented the meta-prompts used in their prompt optimization feature, leveraging best practices and meta-schemas for enhanced prompt generation and JSON/function syntax.

Distilabel 1.4 Released: Enhanced Synthetic Dataset Creation

Distilabel 1.4, an open-source framework for creating synthetic datasets, introduces new features for data manipulation, cost savings, output caching, artifact generation, and new tasks like CLAIR and APIGen.

Research

Contextual Document Embedding: Enhancing Retrieval for RAG

Contextual Document Embedding improves retrieval performance by incorporating neighboring document information during training and encoding, leading to context-aware embeddings that excel in out-of-domain tasks. The associated model and notebook are available on Hugging Face.

Pyramid Flow SD3: An Open-Source Text-to-Video Model

Pyramid Flow SD3, a 2B parameter Diffusion Transformer, generates 10-second, 768p videos at 24fps. This open-source model, under the MIT license, supports both text-to-video and image-to-video generation.

CoALA: Cognitive Architectures for Language Agents

CoALA introduces a structured approach to designing AI agents by integrating cognitive architecture principles with LLMs, focusing on modular memory, structured action spaces, and a generalized decision-making process.

Long-Context Retrieval: LLM Performance Comparison

Databricks Mosaic compared LLM performance in long-context RAG tasks, finding that Gemini 1.5 maintains performance up to 2 million tokens, while other models show varying performance declines.

General

Efficient Open LLM Serving on Google TPUs with Hex-LLM

Hex-LLM, a new LLM serving framework optimized for TPUs, offers low-cost, high-throughput deployment for open models from Hugging Face.

H100 GPU Price Drop: Cloud Implications

Eugene Cheah discusses the significant drop in NVIDIA H100 GPU prices and explores the potential implications for cloud providers.

Prioritizing Evaluation Prompts in AI Development

Starting an AI project? Write your evaluation prompts first to streamline iteration, track improvements, and compare prompts effectively.

I hope you enjoyed this newsletter. 🤗 If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week 👋🏻👋🏻