philschmid.de - RSS feed

philschmid.de - RSS feed https://www.philschmid.de RSS feed for my blog www.philschmid.de How to correctly use MCP servers with your AI Agents https://www.philschmid.de/use-mcp-servers https://www.philschmid.de/use-mcp-servers MCP servers are not dead. Blindly enabling them bloats your context, which leads to higher cost and worse performance. Here are two proven patterns on how to correctly use MCP servers and avoid the bloat. Mon, 27 Apr 2026 00:00:00 GMT 8 Tips for Writing Agent Skills https://www.philschmid.de/agent-skills-tips https://www.philschmid.de/agent-skills-tips 8 Tips for Writing Agent Skills. Know What a Skill Is, Nail the Description, Write Instructions, Keep It Lean, Set the Right Level of Freedom, Don't Skip Negative Cases, Test It Before You Ship It, Know When to Retire a Skill. Mon, 13 Apr 2026 00:00:00 GMT How to use Gemma 4 with the Gemini API and Google AI Studio https://www.philschmid.de/gemma-4-gemini-api https://www.philschmid.de/gemma-4-gemini-api Learn how to use Gemma 4 with the Gemini API and Google AI Studio. Tue, 07 Apr 2026 00:00:00 GMT How Kimi, Cursor, and Chroma Train Agentic Models with RL https://www.philschmid.de/kimi-composer-context https://www.philschmid.de/kimi-composer-context Learn the unique ways how Kimi, Cursor, and Chroma train agentic models with RL. Sat, 28 Mar 2026 00:00:00 GMT Combine Built-in Tools and Function Calling in the Gemini Interactions API https://www.philschmid.de/tool-combo https://www.philschmid.de/tool-combo Learn how to combine built-in tools and function calling in the Gemini Interactions API. Tue, 24 Mar 2026 00:00:00 GMT Developer Guide: Nano Banana 2 with the Gemini Interactions API https://www.philschmid.de/nano-banana-2-interactions-api https://www.philschmid.de/nano-banana-2-interactions-api Learn how to use the Gemini Interactions API to build a personalized Japan travel brochure with Nano Banana 2. Mon, 16 Mar 2026 00:00:00 GMT How Autoresearch will change Small Language Models adoption https://www.philschmid.de/autoresearch https://www.philschmid.de/autoresearch Autoresearch lets an AI agent run hundreds of model training experiments overnight. Learn how it works, early results from Karpathy and Shopify, and how to apply it. Tue, 10 Mar 2026 00:00:00 GMT Practical Guide to Evaluating and Testing Agent Skills https://www.philschmid.de/testing-skills https://www.philschmid.de/testing-skills Learn how to systematically test and improve agent skills using deterministic checks and a real-world Gemini API example. Wed, 04 Mar 2026 00:00:00 GMT Writing a Good AGENTS.md https://www.philschmid.de/writing-good-agents https://www.philschmid.de/writing-good-agents Learn what to include, what to skip, and how to structure your AGENTS.md for best results. Tue, 24 Feb 2026 00:00:00 GMT Agents: Inner Loop vs Outer Loop https://www.philschmid.de/inner-loop-vs-outer-loop https://www.philschmid.de/inner-loop-vs-outer-loop Most agent frameworks share the same hardcoded tool loop; what differs is how the model uses it. This post explains the inner loop—an agent verifying its own work within a task—and the outer loop—an agent carrying lessons across tasks via persistent memory, skills, and rules files—and why both are needed for agents that feel reliable and get smarter over time. Fri, 20 Feb 2026 00:00:00 GMT Can We Close the Loop in 2026? https://www.philschmid.de/closing-the-loop https://www.philschmid.de/closing-the-loop What makes some AI agents feel like collaborators while others need constant babysitting? Two capabilities matter: self-awareness — does the agent understand what it is and how to use its tools — and closing the loop — can it verify its own work before responding. This post breaks down where agents stand today, how production systems like Spotify scaffold verification, and what needs to improve for agents to earn real autonomy in 2026. Tue, 17 Feb 2026 00:00:00 GMT Multimodal Function Calling with Gemini 3 and Interactions API https://www.philschmid.de/interactions-multimodal-fc https://www.philschmid.de/interactions-multimodal-fc Multimodal function calling allows tools to return images the model can process natively, similar to how you pass images in prompts. Instead of describing what's in a file, your tool returns the actual image and Gemini 3 processes it natively. Fri, 13 Feb 2026 00:00:00 GMT Getting Started with Gemini Deep Research API https://www.philschmid.de/gemini-deep-research-getting-started https://www.philschmid.de/gemini-deep-research-getting-started Learn how to use the new Gemini Deep Research agent via the Interactions API to perform complex research tasks, generate images based on the findings, and translate the results. Mon, 02 Feb 2026 00:00:00 GMT The Agent Client Protocol Overview https://www.philschmid.de/acp-overview https://www.philschmid.de/acp-overview The Agent Client Protocol (ACP) is an open standard abstracts the events and outputs of AI agents and provides a common interface for editors to interact with them. Similar to MCP but for agent to client (UI) communication. Sun, 01 Feb 2026 00:00:00 GMT Gemini Interactions API Quick Start https://www.philschmid.de/interactions-api-quickstart https://www.philschmid.de/interactions-api-quickstart The Interactions API is a unified interface for building with Gemini models and agents. It simplifies the development of agentic applications by handling server-side state management, tool orchestration, and long-running tasks. Thu, 22 Jan 2026 00:00:00 GMT MCP is Not the Problem, It's your Server: Best Practices for Building MCP Servers https://www.philschmid.de/mcp-best-practices https://www.philschmid.de/mcp-best-practices The Model Context Protocol (MCP) has exploded roughly 1 year ago, everyone rushed to build MCP servers. The hype was real. Yet, most MCP servers disappoint. Most developers blame the protocol. The protocol feels like it's dying on social media. Wed, 21 Jan 2026 00:00:00 GMT Transparent PNG Stickers with Nano Banana Pro and Gemini interactions API https://www.philschmid.de/generate-stickers https://www.philschmid.de/generate-stickers Learn how to generate transparent PNG stickers using Nano Banana Pro and the Gemini Interactions API, featuring chromakey green background removal with HSV detection. Mon, 19 Jan 2026 00:00:00 GMT Building Agents with the Gemini Interactions API https://www.philschmid.de/building-agents-interactions-api https://www.philschmid.de/building-agents-interactions-api Learn how to build AI agents using the new Gemini Interactions API, featuring server-side state management and simplified tool orchestration. Wed, 14 Jan 2026 00:00:00 GMT Introducing MCP CLI: A way to call MCP Servers Efficiently https://www.philschmid.de/mcp-cli https://www.philschmid.de/mcp-cli Mcp-cli is a lightweight CLI that allows dynamic discovery of MCP, reducing token consumption while making tool interactions more efficient for AI coding agents. Fri, 09 Jan 2026 00:00:00 GMT The importance of Agent Harness in 2026 https://www.philschmid.de/agent-harness-2026 https://www.philschmid.de/agent-harness-2026 In 2026, Agent Harnesses will become essential for building reliable AI systems that can handle complex, multi-day tasks. Mon, 05 Jan 2026 00:00:00 GMT 8 Predictions for 2026. What comes next in AI? https://www.philschmid.de/2026-predictions https://www.philschmid.de/2026-predictions 8 Predictions for 2026, exploring the future of AI, personal agents, smart homes, and more. Wed, 31 Dec 2025 00:00:00 GMT Context Engineering for AI Agents: Part 2 https://www.philschmid.de/context-engineering-part-2 https://www.philschmid.de/context-engineering-part-2 Building on the foundations of Context Engineering, this post explores advanced strategies to manage context rot, multi-agent coordination, and action space optimization for AI agents. Thu, 04 Dec 2025 00:00:00 GMT Why (Senior) Engineers Struggle to Build AI Agents https://www.philschmid.de/why-engineers-struggle-building-agents https://www.philschmid.de/why-engineers-struggle-building-agents Traditional software engineering is deterministic, while AI agents operate probabilistically. This fundamental difference creates challenges for engineers accustomed to strict interfaces and predictable outcomes. Wed, 26 Nov 2025 00:00:00 GMT Practical Guide on how to build an Agent from scratch with Gemini 3 https://www.philschmid.de/building-agents https://www.philschmid.de/building-agents A step-by-step practical guide on building AI agents using Gemini 3 Pro, covering tool integration, context management, and best practices for creating effective and reliable agents. Fri, 21 Nov 2025 00:00:00 GMT Gemini 3 Prompting: Best Practices for General Usage https://www.philschmid.de/gemini-3-prompt-practices https://www.philschmid.de/gemini-3-prompt-practices A comprehensive guide on best practices for prompting Gemini 3, focusing on clarity, structure, reasoning, and agentic tool use to maximize model performance across various domains. Wed, 19 Nov 2025 00:00:00 GMT Gemini API File Search: A Web Developer Tutorial https://www.philschmid.de/gemini-file-search-javascript https://www.philschmid.de/gemini-file-search-javascript Learn how to use the Gemini API File Search tool with JavaScript/TypeScript to build a Retrieval-Augmented Generation (RAG) system. Fri, 07 Nov 2025 00:00:00 GMT Build your first AI Agent with Gemini, n8n and Google Cloud Run https://www.philschmid.de/n8n-cloud-run-gemini https://www.philschmid.de/n8n-cloud-run-gemini Learn how to deploy n8n on Google Cloud Run with PostgreSQL and create an AI Agent using Google Gemini 2.5. Thu, 30 Oct 2025 00:00:00 GMT AI Agent Benchmark Compendium https://www.philschmid.de/benchmark-compedium https://www.philschmid.de/benchmark-compedium An extensive compendium of over 50 benchmarks for evaluating AI agents, categorized into Function Calling and Tool Use, General Assistant and Reasoning, Coding and Software Engineering, and Computer Interaction. Wed, 15 Oct 2025 00:00:00 GMT Agents 2.0: From Shallow Loops to Deep Agents https://www.philschmid.de/agents-2.0-deep-agents https://www.philschmid.de/agents-2.0-deep-agents An overview of the architectural shift from Shallow Agents (Agent 1.0) to Deep Agents (Agent 2.0) and how to build complex AI agents that can handle multi-step tasks over extended periods. Sun, 12 Oct 2025 00:00:00 GMT The Rise of Subagents https://www.philschmid.de/the-rise-of-subagents https://www.philschmid.de/the-rise-of-subagents The rise of subagents is a trend in the AI community. We are seeing more and more use of subagents to reliably handle specific user goals. Mon, 15 Sep 2025 00:00:00 GMT The 10 Steps for product AI generation with Gemini 2.5 Flash https://www.philschmid.de/gemini-image-generation-product https://www.philschmid.de/gemini-image-generation-product Learn how to use Gemini 2.5 Flash for product image generation. Wed, 27 Aug 2025 00:00:00 GMT Memory in Agents, Make LLMs remember. https://www.philschmid.de/memory-in-agents https://www.philschmid.de/memory-in-agents Learn how to engineer long-term memory into stateless AI agents to overcome their biggest limitation and unlock true personalization. Mon, 04 Aug 2025 00:00:00 GMT Google Gemini CLI Cheatsheet https://www.philschmid.de/gemini-cli-cheatsheet https://www.philschmid.de/gemini-cli-cheatsheet A comprehensive cheatsheet on using Google's Gemini CLI, covering installation, authentication, configuration, and core commands. Thu, 24 Jul 2025 00:00:00 GMT Code Sandbox MCP: A Simple Code Interpreter for Your AI Agents https://www.philschmid.de/code-sandbox-mcp https://www.philschmid.de/code-sandbox-mcp Code Sandbox MCP is a simple, self-hosted code interpreter for your AI agents. It allows you to execute code snippets in containerized environments. Tue, 22 Jul 2025 00:00:00 GMT Integrating Long-Term Memory with Gemini 2.5 https://www.philschmid.de/gemini-with-memory https://www.philschmid.de/gemini-with-memory This guide shows you how to add long-term memory to your Gemini 2.5 chatbot using the Gemini API and Mem0. Thu, 03 Jul 2025 00:00:00 GMT The New Skill in AI is Not Prompting, It's Context Engineering https://www.philschmid.de/context-engineering https://www.philschmid.de/context-engineering Context Engineering is the new skill in AI. It is about providing the right information and tools, in the right format, at the right time. Mon, 30 Jun 2025 00:00:00 GMT Single vs Multi-Agent System? https://www.philschmid.de/single-vs-multi-agents https://www.philschmid.de/single-vs-multi-agents Single vs. multi-agent? The real secret to building AI agents is 'read vs. write'. Learn which to use for your task and build reliable systems. Fri, 20 Jun 2025 00:00:00 GMT Zero to One: Learning Agentic Patterns https://www.philschmid.de/agentic-pattern https://www.philschmid.de/agentic-pattern Learn common agentic design patterns and workflows for building robust, scalable AI applications, understanding when to use each. Mon, 05 May 2025 00:00:00 GMT Google Gemini LangChain Cheatsheet https://www.philschmid.de/gemini-langchain-cheatsheet https://www.philschmid.de/gemini-langchain-cheatsheet A comprehensive cheatsheet on using Google's Gemini within the LangChain, covering chat functionalities with multimodal inputs, tool usage, structured data generation, and text embedding techniques. Mon, 28 Apr 2025 00:00:00 GMT OpenAI Codex CLI, how does it work? https://www.philschmid.de/openai-codex-cli https://www.philschmid.de/openai-codex-cli I used Gemini 2.5 Pro to better understand the OpenAI Codex CLI, a tool that allows you to interact with an AI model directly in your terminal to perform coding tasks. Thu, 17 Apr 2025 00:00:00 GMT Model Context Protocol (MCP) an overview https://www.philschmid.de/mcp-introduction https://www.philschmid.de/mcp-introduction Overview of the Model Context Protocol (MCP) how it works, what are MCP servers and clients, and how to use it. Thu, 03 Apr 2025 00:00:00 GMT ReAct agent from scratch with Gemini 2.5 and LangGraph https://www.philschmid.de/langgraph-gemini-2-5-react-agent https://www.philschmid.de/langgraph-gemini-2-5-react-agent Build a ReAct agent from scratch with Gemini 2.5 and LangGraph. Mon, 31 Mar 2025 00:00:00 GMT Pass@k vs Pass^k: Understanding Agent Reliability https://www.philschmid.de/agents-pass-at-k-pass-power-k https://www.philschmid.de/agents-pass-at-k-pass-power-k Production agents need to be reliable. Why pass^k is a better metrics than pass@k. Mon, 24 Mar 2025 00:00:00 GMT Google Gemma 3 Function Calling Example https://www.philschmid.de/gemma-function-calling https://www.philschmid.de/gemma-function-calling Learn how to use function calling with Google DeepMind Gemma 3 27B It Fri, 14 Mar 2025 00:00:00 GMT Function Calling Guide: Google DeepMind Gemini 2.0 Flash https://www.philschmid.de/gemini-function-calling https://www.philschmid.de/gemini-function-calling Learn how to use function calling with Google DeepMind Gemini 2.0 Flash. Wed, 05 Mar 2025 00:00:00 GMT From PDFs to Insights: Structured Outputs from PDFs with Gemini 2.0 https://www.philschmid.de/gemini-pdf-to-data https://www.philschmid.de/gemini-pdf-to-data Learn how to extract structured data from PDFs with Gemini 2.0 and Pydantic. Fri, 07 Feb 2025 00:00:00 GMT Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial https://www.philschmid.de/mini-deepseek-r1 https://www.philschmid.de/mini-deepseek-r1 Reproduce Deepseek R1 „aha moment“ and train an open model using reinforcement learning trying to teach it self-verification and search abilities all on its own to solve the Countdown Game. Thu, 30 Jan 2025 00:00:00 GMT How to align open LLMs in 2025 with DPO and and synthetic data https://www.philschmid.de/rl-with-llms-in-2025-dpo https://www.philschmid.de/rl-with-llms-in-2025-dpo Learn how to align LLMs using Hugging Face TRL and RLHF through Direct Preference Optimization (DPO) and on-policy synthetic data. Thu, 23 Jan 2025 00:00:00 GMT Bite: How Deepseek R1 was trained https://www.philschmid.de/deepseek-r1 https://www.philschmid.de/deepseek-r1 5 Minute Read on how Deepseek R1 was trained using Group Relative Policy Optimization (GRPO) and RL-focused multi-stage training approach. Fri, 17 Jan 2025 00:00:00 GMT How to use Anthropic MCP Server with open LLMs, OpenAI or Google Gemini https://www.philschmid.de/mcp-example-llama https://www.philschmid.de/mcp-example-llama How to use Anthropic MCP Server with open LLMs, OpenAI or Google Gemini Fri, 17 Jan 2025 00:00:00 GMT Fine-tune classifier with ModernBERT in 2025 https://www.philschmid.de/fine-tune-modern-bert-in-2025 https://www.philschmid.de/fine-tune-modern-bert-in-2025 Modern updated guide on how to fine-tune BERT models for classification tasks in 2025. Wed, 25 Dec 2024 00:00:00 GMT How to fine-tune open LLMs in 2025 with Hugging Face https://www.philschmid.de/fine-tune-llms-in-2025 https://www.philschmid.de/fine-tune-llms-in-2025 The only guide you need to fine-tune open LLMs in 2025, including QLoRA, Spectrum, Flash Attention, Liger Kernels and more. Fri, 20 Dec 2024 00:00:00 GMT Deploy QwQ-32B-Preview the best open Reasoning Model on AWS with Hugging Face https://www.philschmid.de/sagemaker-deploy-qwq https://www.philschmid.de/sagemaker-deploy-qwq Qwen's QwQ-32B-Preview is the best open reasoning model for mathematical and programming reasoning capabilities among open models directly competing with OpenAI o1. Tue, 03 Dec 2024 00:00:00 GMT Deploy Llama 3.2 Vision on Amazon SageMaker https://www.philschmid.de/sagemaker-llama32-vision https://www.philschmid.de/sagemaker-llama32-vision Learn how to deploy Llama 3.2 Vision on Amazon SageMaker and run inference with it. Thu, 17 Oct 2024 00:00:00 GMT How to Fine-Tune Multimodal Models or VLMs with Hugging Face TRL https://www.philschmid.de/fine-tune-multimodal-llms-with-trl https://www.philschmid.de/fine-tune-multimodal-llms-with-trl Learn how to fine-tune multimodal models like Llama 3.2 Vision or Qwen 2 VL to create custom image-to-text generation models. Mon, 30 Sep 2024 00:00:00 GMT Evaluate open LLMs with Vertex AI and Gemini https://www.philschmid.de/evaluate-llm-with-gemini https://www.philschmid.de/evaluate-llm-with-gemini Evaluate Llama 3.1 8B on Vertex AI with Gemini 1.5 Pro as LLM as a Judge using the Gen AI Evaluation Service. Tue, 24 Sep 2024 00:00:00 GMT Evaluate LLMs using Evaluation Harness and Hugging Face TGI/vLLM https://www.philschmid.de/evaluate-llms-with-lm-eval-and-tgi-vllm https://www.philschmid.de/evaluate-llms-with-lm-eval-and-tgi-vllm Evaluate Llama 3.1 8B Instruct on IFEval and GSM8K benchmarks with Chain of Thought reasoning using Evaluation Harness and Hugging Face TGI/vLLM. Thu, 19 Sep 2024 00:00:00 GMT Deploy open LLMs with Terraform and Amazon SageMaker https://www.philschmid.de/terraform-llm-sagemaker https://www.philschmid.de/terraform-llm-sagemaker Learn how to deploy open large language models (LLMs) like Llama 3.1 8B Instruct to Amazon SageMaker using Terraform Infrastructure as Code (IaC). Mon, 05 Aug 2024 00:00:00 GMT LLM Evaluation doesn't need to be complicated https://www.philschmid.de/llm-evaluation https://www.philschmid.de/llm-evaluation LLM Evaluation doesn't need to be complicated. You don't need complex pipelines, databases or infrastructure components to get started building an effective evaluation pipeline. Thu, 11 Jul 2024 00:00:00 GMT Evaluating Open LLMs with MixEval: The Closest Benchmark to LMSYS Chatbot Arena https://www.philschmid.de/evaluate-llm-mixeval https://www.philschmid.de/evaluate-llm-mixeval Evaluate open LLMs with MixEval, the closest benchmark to LMSYS Chatbot Arena for a fraction of the cost and time. Fri, 28 Jun 2024 00:00:00 GMT Train and Deploy open Embedding Models on Amazon SageMaker https://www.philschmid.de/sagemaker-train-deploy-embedding-models https://www.philschmid.de/sagemaker-train-deploy-embedding-models Learn how to fine-tune and deploy a custom embedding model on Amazon SageMaker using the new Hugging Face Embedding Container. Tue, 25 Jun 2024 00:00:00 GMT Deploy Mixtral 8x7B on AWS Inferentia2 with Hugging Face Optimum https://www.philschmid.de/inferentia2-mixtral-8x7b https://www.philschmid.de/inferentia2-mixtral-8x7b Deploy Mixtral 8x7B to AWS Inferentia2 with Hugging Face Optimum on Amazon SageMaker and benchmark it with llmperf. Tue, 18 Jun 2024 00:00:00 GMT Fine-tune Llama 3 with PyTorch FSDP and Q-Lora on Amazon SageMaker https://www.philschmid.de/sagemaker-train-deploy-llama3 https://www.philschmid.de/sagemaker-train-deploy-llama3 Train and deploy Llama 3 on Amazon SageMaker using PyTorch FSDP and Q-Lora Tue, 11 Jun 2024 00:00:00 GMT Fine-tune Embedding models for Retrieval Augmented Generation (RAG) https://www.philschmid.de/fine-tune-embedding-model-for-rag https://www.philschmid.de/fine-tune-embedding-model-for-rag Customizing embedding models for domain-specific data can significantly boost the retrieval performance of your RAG Application. Tue, 04 Jun 2024 00:00:00 GMT Understanding the Cost of Generative AI Models in Production https://www.philschmid.de/cost-generative-ai https://www.philschmid.de/cost-generative-ai The cost of deploying Generative AI models is very shallow, many people are fixated on raw compute pricing. But in reality, the cost is much more complex. Mon, 27 May 2024 00:00:00 GMT Deploy Llama 3 70B on AWS Inferentia2 with Hugging Face Optimum https://www.philschmid.de/inferentia2-llama3-70b https://www.philschmid.de/inferentia2-llama3-70b Learn how to deploy Llama 3 70B on AWS Inferentia2 with Hugging Face Optimum on Amazon SageMaker. Thu, 23 May 2024 00:00:00 GMT Deploy open LLMs with vLLM on Hugging Face Inference Endpoints https://www.philschmid.de/vllm-inference-endpoints https://www.philschmid.de/vllm-inference-endpoints In this blog post, we will show you how to deploy open LLMS with vLLM on Hugging Face Inference Endpoints. Thu, 02 May 2024 00:00:00 GMT Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora https://www.philschmid.de/fsdp-qlora-llama3 https://www.philschmid.de/fsdp-qlora-llama3 Learn how to fine-tune Llama 3 70b with PyTorch FSDP and Q-Lora using Hugging Face TRL, Transformers, PEFT and Datasets. Mon, 22 Apr 2024 00:00:00 GMT Deploy Llama 3 on Amazon SageMaker https://www.philschmid.de/sagemaker-llama3 https://www.philschmid.de/sagemaker-llama3 In this blog post you will learn how to deploy Llama 3 70B to Amazon SageMaker. Thu, 18 Apr 2024 00:00:00 GMT Accelerate Mixtral 8x7B with Speculative Decoding and Quantization on Amazon SageMaker https://www.philschmid.de/sagemaker-awq-medusa https://www.philschmid.de/sagemaker-awq-medusa In this blog post you will learn how to accelerate Mixtral using Speculative Decoding (Medusa) and Quantization (AWQ). Tue, 02 Apr 2024 00:00:00 GMT Deploy Llama 2 70B on AWS Inferentia2 with Hugging Face Optimum https://www.philschmid.de/inferentia2-llama-70b-inference https://www.philschmid.de/inferentia2-llama-70b-inference In this blog post you will learn how to deploy Meta Llama 2 70B on AWS Inferentia2 with Hugging Face Optimum on Amazon SageMaker. Tue, 26 Mar 2024 00:00:00 GMT Fine-Tune and Evaluate LLMs in 2024 with Amazon SageMaker https://www.philschmid.de/sagemaker-train-evalaute-llms-2024 https://www.philschmid.de/sagemaker-train-evalaute-llms-2024 In this blog post you will learn how to fine-tune open LLMs from Hugging Face using Amazon SageMaker. Tue, 12 Mar 2024 00:00:00 GMT Evaluate LLMs with Hugging Face Lighteval on Amazon SageMaker https://www.philschmid.de/sagemaker-evaluate-llm-lighteval https://www.philschmid.de/sagemaker-evaluate-llm-lighteval In this blog post you will learn how to evaluate LLMs using Hugging Face lighteval on Amazon SageMaker. Tue, 05 Mar 2024 00:00:00 GMT How to fine-tune Google Gemma with ChatML and Hugging Face TRL https://www.philschmid.de/fine-tune-google-gemma https://www.philschmid.de/fine-tune-google-gemma In this blog post you will learn how to fine tune Google Gemma using Hugging Face Transformers, Datasets and TRL. Fri, 01 Mar 2024 00:00:00 GMT RLHF in 2024 with DPO and Hugging Face https://www.philschmid.de/dpo-align-llms-in-2024-with-trl https://www.philschmid.de/dpo-align-llms-in-2024-with-trl In this blog post you will learn how to align LLMs using Hugging Face TRL and RLHF through Direct Preference Optimization (DPO). Tue, 23 Jan 2024 00:00:00 GMT How to Fine-Tune LLMs in 2024 with Hugging Face https://www.philschmid.de/fine-tune-llms-in-2024-with-trl https://www.philschmid.de/fine-tune-llms-in-2024-with-trl In this blog post you will learn how to fine-tune LLMs using Hugging Face TRL, Transformers and Datasets in 2024. We will fine-tune a LLM on a text to SQL dataset. Tue, 23 Jan 2024 00:00:00 GMT Scale LLM Inference on Amazon SageMaker with Multi-Replica Endpoints https://www.philschmid.de/sagemaker-multi-replica https://www.philschmid.de/sagemaker-multi-replica In this blog post you will learn how to increase the throughput of Llama 13B on Amazon SageMaker using single instance multi-replica endpoints. Thu, 11 Jan 2024 00:00:00 GMT Fine-tune Llama 7B on AWS Trainium https://www.philschmid.de/fine-tune-llama-7b-trainium https://www.philschmid.de/fine-tune-llama-7b-trainium In this blog post you will learn how to fine-tune Llama 7B on AWS Trainium using the Hugging Face Optimum Neuron library. Thu, 21 Dec 2023 00:00:00 GMT Programmatically manage 🤗 Inference Endpoints https://www.philschmid.de/inference-endpoints-iac https://www.philschmid.de/inference-endpoints-iac In this blog post you will learn how to use the huggingface_hub library to create, send requests to, pause, and delete Hugging Face Inference Endpoints. Wed, 20 Dec 2023 00:00:00 GMT Deploy Mixtral 8x7B on Amazon SageMaker https://www.philschmid.de/sagemaker-deploy-mixtral https://www.philschmid.de/sagemaker-deploy-mixtral In this blog post you will learn how to deploy Mixtral 8x7B to Amazon SageMaker. Tue, 12 Dec 2023 00:00:00 GMT Deploy Embedding Models on AWS inferentia2 with Amazon SageMaker https://www.philschmid.de/inferentia2-embeddings https://www.philschmid.de/inferentia2-embeddings In this blog post, you will learn how to compile and deploy Embedding Models on AWS Inferentia2. Tue, 21 Nov 2023 00:00:00 GMT Deploy Llama 2 7B on AWS inferentia2 with Amazon SageMaker https://www.philschmid.de/inferentia2-llama-7b https://www.philschmid.de/inferentia2-llama-7b In this blog post, you will learn how to compile and deploy Llama 2 7B on AWS Inferentia2 with Amazon SageMaker. Tue, 14 Nov 2023 00:00:00 GMT Deploy Stable Diffusion XL on AWS inferentia2 with Amazon SageMaker https://www.philschmid.de/inferentia2-stable-diffusion-xl https://www.philschmid.de/inferentia2-stable-diffusion-xl In this blog post, you will learn how to compile and deploy Stable Diffusion XL on AWS Inferentia2 with Amazon SageMaker. Tue, 07 Nov 2023 00:00:00 GMT Amazon Bedrock: How good (bad) is Titan Embeddings? https://www.philschmid.de/amazon-titan-embeddings https://www.philschmid.de/amazon-titan-embeddings In this blog post I took a closer look at Amazon Bedrock Titan embeddings model and how good (bad) the perform. Fri, 03 Nov 2023 00:00:00 GMT Evaluate LLMs and RAG a practical example using Langchain and Hugging Face https://www.philschmid.de/evaluate-llm https://www.philschmid.de/evaluate-llm Learn how to evaluate LLMs and RAG pipelines using Langchain and Hugging Face Mon, 30 Oct 2023 00:00:00 GMT Deploy Idefics 9B and 80B on Amazon SageMaker https://www.philschmid.de/sagemaker-idefics https://www.philschmid.de/sagemaker-idefics Learn how to deploy Hugging Face Idefics 9B and 80B to Amazon SageMaker and send requests with images and text to the model. Thu, 12 Oct 2023 00:00:00 GMT Train and Deploy Mistral 7B with Hugging Face on Amazon SageMaker https://www.philschmid.de/sagemaker-mistral https://www.philschmid.de/sagemaker-mistral Learn how to fine-tuned and deploy Mistral 7B with Hugging Face on Amazon SageMaker and leverage technique like Qlora, Flash Attention and response streaming Thu, 05 Oct 2023 00:00:00 GMT Llama 2 on Amazon SageMaker a Benchmark https://www.philschmid.de/sagemaker-llama-benchmark https://www.philschmid.de/sagemaker-llama-benchmark Benchmark evaluating varying sizes of Llama 2 on a range of Amazon EC2 instance types with different load levels on latency (ms per token), and throughput (tokens per second). Tue, 26 Sep 2023 00:00:00 GMT Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA and Flash Attention https://www.philschmid.de/deepspeed-lora-flash-attention https://www.philschmid.de/deepspeed-lora-flash-attention In this example we will show how to fine-tune Falcon 180B using DeepSpeed, Hugging Face Transformers, LoRA with Flash Attention on a multi-GPU machine. Wed, 20 Sep 2023 00:00:00 GMT Fine-tune Falcon 180B with QLoRA and Flash Attention on Amazon SageMaker https://www.philschmid.de/sagemaker-falcon-180b-qlora https://www.philschmid.de/sagemaker-falcon-180b-qlora Learn how to fine-tune Falcon 180B with QLoRA and Flash Attention on Amazon SageMaker. Tue, 12 Sep 2023 00:00:00 GMT Deploy Falcon 180B on Amazon SageMaker https://www.philschmid.de/sagemaker-falcon-180b https://www.philschmid.de/sagemaker-falcon-180b Learn how to deploy Falcon 180B to Amazon SageMaker and how to create a chatbot with streaming inference. Thu, 07 Sep 2023 00:00:00 GMT Optimize open LLMs using GPTQ and Hugging Face Optimum https://www.philschmid.de/gptq-llama https://www.philschmid.de/gptq-llama Learn how to quantize Llama 2 7B with GPTQ to use 4x less memory. Thu, 31 Aug 2023 00:00:00 GMT LLMOps: Deploy Open LLMs using Infrastructure as Code with AWS CDK https://www.philschmid.de/cdk-llama2 https://www.philschmid.de/cdk-llama2 Learn how to use Infrastructure as Code with (AWS CDK)] to deploy and manage Llama 2 Tue, 15 Aug 2023 00:00:00 GMT Deploy Llama 2 7B/13B/70B on Amazon SageMaker https://www.philschmid.de/sagemaker-llama-llm https://www.philschmid.de/sagemaker-llama-llm Learn how to deploy Llama 2 models (7B - 70B) to Amazon SageMaker using the Hugging Face LLM Inference DLC. Mon, 07 Aug 2023 00:00:00 GMT Introducing EasyLLM - streamline open LLMs https://www.philschmid.de/introducing-easyllm https://www.philschmid.de/introducing-easyllm EasyLLM is an open-source project that provides helpful tools and methods for working with large language models (LLMs). Thu, 03 Aug 2023 00:00:00 GMT Extended Guide: Instruction-tune Llama 2 https://www.philschmid.de/instruction-tune-llama-2 https://www.philschmid.de/instruction-tune-llama-2 This blog post is an extended guide on instruction-tuning Llama 2 from Meta AI Wed, 26 Jul 2023 00:00:00 GMT LLaMA 2 - Every Resource you need https://www.philschmid.de/llama-2 https://www.philschmid.de/llama-2 All Resources for LLaMA 2, How to test, train, and deploy it. Fri, 21 Jul 2023 00:00:00 GMT Fine-tune LLaMA 2 (7-70B) on Amazon SageMaker https://www.philschmid.de/sagemaker-llama2-qlora https://www.philschmid.de/sagemaker-llama2-qlora Learn how to train LLaMa 2 using QLoRA Hugging Face Transformers on Amazon SageMaker Tue, 18 Jul 2023 00:00:00 GMT Train LLMs using QLoRA on Amazon SageMaker https://www.philschmid.de/sagemaker-falcon-qlora https://www.philschmid.de/sagemaker-falcon-qlora Learn how to train LLMs using QLoRA on Amazon SageMaker Thu, 13 Jul 2023 00:00:00 GMT Deploy LLMs with Hugging Face Inference Endpoints https://www.philschmid.de/endpoints-llm https://www.philschmid.de/endpoints-llm Learn how to deploy LLMs using Hugging Face Inference Endpoints Tue, 04 Jul 2023 00:00:00 GMT Optimize and Deploy BERT on AWS inferentia2 https://www.philschmid.de/optimize-deploy-bert-inf2 https://www.philschmid.de/optimize-deploy-bert-inf2 Learn how to optimize and deploy BERT on AWS Inferentia2 Wed, 28 Jun 2023 00:00:00 GMT Securely deploy LLMs inside VPCs with Hugging Face and Amazon SageMaker https://www.philschmid.de/sagemaker-llm-vpc https://www.philschmid.de/sagemaker-llm-vpc Learn how to deploy LLMs into VPCs from S3 with Amazon SageMaker using the new Hugging Face LLM Inference DLC. Tue, 20 Jun 2023 00:00:00 GMT Deploy Falcon 7B and 40B on Amazon SageMaker https://www.philschmid.de/sagemaker-falcon-llm https://www.philschmid.de/sagemaker-falcon-llm Learn how to deploy Falcon 40B to Amazon SageMaker using the new Hugging Face LLM Inference DLC. Wed, 07 Jun 2023 00:00:00 GMT Fine-tune BERT for Text Classification on AWS Trainium https://www.philschmid.de/getting-started-trainium https://www.philschmid.de/getting-started-trainium Learn how to fine-tune Hugging Face Transformers using AWS Trainium. Tue, 06 Jun 2023 00:00:00 GMT Introducing the Hugging Face LLM Inference Container for Amazon SageMaker https://www.philschmid.de/sagemaker-huggingface-llm https://www.philschmid.de/sagemaker-huggingface-llm Learn how to deploy the open-source LLMs, like BLOOM to Amazon SageMaker for inference using the new Hugging Face LLM Inference Container. Wed, 31 May 2023 00:00:00 GMT Generative AI for Document Understanding with Hugging Face and Amazon SageMaker https://www.philschmid.de/sagemaker-donut https://www.philschmid.de/sagemaker-donut Learn how to fine-tune Donut-base a Generative AI model for document-understand/document-parsing using Hugging Face Transformers and Amazon SageMaker. Tue, 23 May 2023 00:00:00 GMT How to scale LLM workloads to 20B+ with Amazon SageMaker using Hugging Face and PyTorch FSDP https://www.philschmid.de/sagemaker-fsdp-gpt https://www.philschmid.de/sagemaker-fsdp-gpt Learn how to fine-tune LLMs on multi-node setups using Amazon SageMaker and Hugging Face Transformers with PyTorch FSDP Tue, 02 May 2023 00:00:00 GMT Setting up AWS Trainium for Hugging Face Transformers https://www.philschmid.de/setup-aws-trainium https://www.philschmid.de/setup-aws-trainium Learn how to quickly set up an AWS Trainium using the Hugging Face Neuron Deep Learning AMI and fine-tune BERT Tue, 25 Apr 2023 00:00:00 GMT Train and Deploy BLOOM with Amazon SageMaker and PEFT https://www.philschmid.de/bloom-sagemaker-peft https://www.philschmid.de/bloom-sagemaker-peft Learn how to fine-tune BLOOMZ 7B with Amazon SageMaker on a Single GPU using LoRA Hugging Face Transformers. Thu, 13 Apr 2023 00:00:00 GMT Introducing IGEL an instruction-tuned German large Language Model https://www.philschmid.de/introducing-igel https://www.philschmid.de/introducing-igel IGEL (Instruction-based German Language Model) is an LLM designed for German language understanding tasks, including sentiment analysis, language translation, and question answering. Tue, 04 Apr 2023 00:00:00 GMT Efficient Large Language Model training with LoRA and Hugging Face https://www.philschmid.de/fine-tune-flan-t5-peft https://www.philschmid.de/fine-tune-flan-t5-peft Learn how to fine-tune Google's FLAN-T5 XXL on a Single GPU using LoRA And Hugging Face Transformers. Thu, 23 Mar 2023 00:00:00 GMT Deploy FLAN-UL2 20B on Amazon SageMaker https://www.philschmid.de/deploy-flan-ul2-sagemaker https://www.philschmid.de/deploy-flan-ul2-sagemaker Learn how to deploy Google's FLAN-UL 20B on Amazon SageMaker for inference. Mon, 20 Mar 2023 00:00:00 GMT Getting started with Pytorch 2.0 and Hugging Face Transformers https://www.philschmid.de/getting-started-pytorch-2-0-transformers https://www.philschmid.de/getting-started-pytorch-2-0-transformers Learn how to get started with Pytorch 2.0 and Hugging Face Transformers and reduce your training time up to 2x. Thu, 16 Mar 2023 00:00:00 GMT Controlled text-to-image generation with ControlNet on Inference Endpoints https://www.philschmid.de/stable-diffusion-controlnet-endpoint https://www.philschmid.de/stable-diffusion-controlnet-endpoint Learn how to deploy ControlNet Stable Diffusion Pipeline on Hugging Face Inference Endpoints to generate controlled images. Fri, 03 Mar 2023 00:00:00 GMT Combine Amazon SageMaker and DeepSpeed to fine-tune FLAN-T5 XXL https://www.philschmid.de/sagemaker-deepspeed https://www.philschmid.de/sagemaker-deepspeed Learn how to fine-tune Google's FLAN-T5 XXL on Amazon SageMaker using DeepSpeed and Hugging Face Transformers. Wed, 22 Feb 2023 00:00:00 GMT Fine-tune FLAN-T5 XL/XXL using DeepSpeed and Hugging Face Transformers https://www.philschmid.de/fine-tune-flan-t5-deepspeed https://www.philschmid.de/fine-tune-flan-t5-deepspeed Learn how to fine-tune Google's FLAN-T5 XXL using DeepSpeed and Hugging Face Transformers. Thu, 16 Feb 2023 00:00:00 GMT Deploy FLAN-T5 XXL on Amazon SageMaker https://www.philschmid.de/deploy-flan-t5-sagemaker https://www.philschmid.de/deploy-flan-t5-sagemaker Learn how to deploy Google's FLAN-T5 XXL on Amazon SageMaker for inference. Wed, 08 Feb 2023 00:00:00 GMT Hugging Face Transformers Examples https://www.philschmid.de/huggingface-transformers-examples https://www.philschmid.de/huggingface-transformers-examples Learn how to leverage Hugging Face Transformers to easily fine-tune your models. Thu, 26 Jan 2023 00:00:00 GMT Getting started with Transformers and TPU using PyTorch https://www.philschmid.de/getting-started-tpu-transformers https://www.philschmid.de/getting-started-tpu-transformers Learn how to get started with Hugging Face Transformers and TPUs using PyTorch, fine-tune a BERT model for Text Classification using the newest Google Cloud TPUs. Mon, 16 Jan 2023 00:00:00 GMT Fine-tune FLAN-T5 for chat and dialogue summarization https://www.philschmid.de/fine-tune-flan-t5 https://www.philschmid.de/fine-tune-flan-t5 Learn how to fine-tune Google's FLAN-T5 for chat and dialogue summarization using Hugging Face Transformers. Tue, 27 Dec 2022 00:00:00 GMT Managed Transcription with OpenAI Whisper and Hugging Face Inference Endpoints https://www.philschmid.de/whisper-inference-endpoints https://www.philschmid.de/whisper-inference-endpoints Learn how to deploy OpenAI Whisper for speech recognition and transcription using Hugging Face Inference Endpoints. Tue, 20 Dec 2022 00:00:00 GMT Stable Diffusion Inpainting example with Hugging Face inference Endpoints https://www.philschmid.de/stable-diffusion-inpainting-inference-endpoints https://www.philschmid.de/stable-diffusion-inpainting-inference-endpoints Learn how to deploy Stable Diffusion 2.0 Inpainting on Hugging Face Inference Endpoints to manipulate images. Thu, 15 Dec 2022 00:00:00 GMT Stable Diffusion with Hugging Face Inference Endpoints https://www.philschmid.de/stable-diffusion-inference-endpoints https://www.philschmid.de/stable-diffusion-inference-endpoints Learn how to deploy Stable Diffusion 2.0 on Hugging Face Inference Endpoints to generate images based from text. Mon, 28 Nov 2022 00:00:00 GMT Document AI: LiLT a better language agnostic LayoutLM model https://www.philschmid.de/fine-tuning-lilt https://www.philschmid.de/fine-tuning-lilt Learn how to fine-tune LiLt (Language independent Layout Transformer) for document-understand/document-parsing using Hugging Face Transformers. Tue, 22 Nov 2022 00:00:00 GMT Multi-Model GPU Inference with Hugging Face Inference Endpoints https://www.philschmid.de/multi-model-inference-endpoints https://www.philschmid.de/multi-model-inference-endpoints Learn how to deploy a multiple models on to a GPU with Hugging Face multi-model inference endpoints. Thu, 17 Nov 2022 00:00:00 GMT Serverless Machine Learning Applications with Hugging Face Gradio and AWS Lambda https://www.philschmid.de/serverless-gradio https://www.philschmid.de/serverless-gradio Learn how to deploy a Hugging Face Gradio Application using Hugging Face Transformers to AWS Lambda for serverless workloads. Tue, 15 Nov 2022 00:00:00 GMT Accelerate Stable Diffusion inference with DeepSpeed-Inference on GPUs https://www.philschmid.de/stable-diffusion-deepspeed-inference https://www.philschmid.de/stable-diffusion-deepspeed-inference Learn how to optimize Stable Diffusion for GPU inference with a 1-line of code using Hugging Face Diffusers and DeepSpeed. Tue, 08 Nov 2022 00:00:00 GMT Stable Diffusion on Amazon SageMaker https://www.philschmid.de/sagemaker-stable-diffusion https://www.philschmid.de/sagemaker-stable-diffusion Learn how to deploy Stable Diffusion to Amazon SageMaker to generate images. Tue, 01 Nov 2022 00:00:00 GMT Deploy T5 11B for inference for less than $500 https://www.philschmid.de/deploy-t5-11b https://www.philschmid.de/deploy-t5-11b Learn how to deploy T5 11B on a single GPU using Hugging Face Inference Endpoints. Tue, 25 Oct 2022 00:00:00 GMT Outperform OpenAI GPT-3 with SetFit for text-classification https://www.philschmid.de/getting-started-setfit https://www.philschmid.de/getting-started-setfit Learn how to use SetFit to create a text-classification model with only a `8` labeled samples per class, or `32` samples in total. You will also learn how to improve your model by using hyperparamter tuning. Tue, 18 Oct 2022 00:00:00 GMT Fine-tuning LayoutLM for document-understanding using Keras and Hugging Face Transformers https://www.philschmid.de/fine-tuning-layoutlm-keras https://www.philschmid.de/fine-tuning-layoutlm-keras Learn how to fine-tune LayoutLM for document-understand using Keras and Hugging Face Transformers. Thu, 13 Oct 2022 00:00:00 GMT Deploy LayoutLM with Hugging Face Inference Endpoints https://www.philschmid.de/inference-endpoints-layoutlm https://www.philschmid.de/inference-endpoints-layoutlm Learn how to deploy LayoutLM for document-understand using Hugging Face Inference Endpoints. Thu, 06 Oct 2022 00:00:00 GMT Document AI: Fine-tuning LayoutLM for document-understanding using Hugging Face Transformers https://www.philschmid.de/fine-tuning-layoutlm https://www.philschmid.de/fine-tuning-layoutlm Learn how to fine-tune LayoutLM for document-understand using Hugging Face Transformers. LayoutLM is a document image understanding and information extraction transformers. Tue, 04 Oct 2022 00:00:00 GMT Custom Inference with Hugging Face Inference Endpoints https://www.philschmid.de/custom-inference-handler https://www.philschmid.de/custom-inference-handler Welcome to this tutorial on how to create a custom inference handler for Hugging Face Inference Endpoints. Thu, 29 Sep 2022 00:00:00 GMT Accelerate GPT-J inference with DeepSpeed-Inference on GPUs https://www.philschmid.de/gptj-deepspeed-inference https://www.philschmid.de/gptj-deepspeed-inference Learn how to optimize GPT-J for GPU inference with a 1-line of code using Hugging Face Transformers and DeepSpeed. Tue, 13 Sep 2022 00:00:00 GMT Document AI: Fine-tuning Donut for document-parsing using Hugging Face Transformers https://www.philschmid.de/fine-tuning-donut https://www.philschmid.de/fine-tuning-donut Learn how to fine-tune Donut-base for document-understand/document-parsing using Hugging Face Transformers. Donut is a new document-understanding model achieving state-of-art performance and can be used for commercial applications. Tue, 06 Sep 2022 00:00:00 GMT Use Sentence Transformers with TensorFlow https://www.philschmid.de/tensorflow-sentence-transformers https://www.philschmid.de/tensorflow-sentence-transformers Learn how to Sentence Transformers model with TensorFlow and Keras for creating document embeddings Tue, 30 Aug 2022 00:00:00 GMT Pre-Training BERT with Hugging Face Transformers and Habana Gaudi https://www.philschmid.de/pre-training-bert-habana https://www.philschmid.de/pre-training-bert-habana Learn how to pre-traing BERT from scratch using Hugging Face Transformers and Habana Gaudi. Wed, 24 Aug 2022 00:00:00 GMT Accelerate BERT inference with DeepSpeed-Inference on GPUs https://www.philschmid.de/bert-deepspeed-inference https://www.philschmid.de/bert-deepspeed-inference Learn how to optimize BERT for GPU inference with a 1-line of code using Hugging Face Transformers and DeepSpeed. Tue, 16 Aug 2022 00:00:00 GMT Accelerate Sentence Transformers with Hugging Face Optimum https://www.philschmid.de/optimize-sentence-transformers https://www.philschmid.de/optimize-sentence-transformers Learn how to optimize Sentence Transformers using Hugging Face Optimum. You will learn how dynamically quantize and optimize a Sentence Transformer for ONNX Runtime. Tue, 02 Aug 2022 00:00:00 GMT Deep Learning setup made easy with EC2 Remote Runner and Habana Gaudi https://www.philschmid.de/habana-gaudi-ec2-runner https://www.philschmid.de/habana-gaudi-ec2-runner Learn how to migrate your training jobs to a Habana Gaudi-based DL1 instance on AWS using EC2 Remote Runner. Tue, 26 Jul 2022 00:00:00 GMT Accelerate Vision Transformer (ViT) with Quantization using Optimum https://www.philschmid.de/optimizing-vision-transformer https://www.philschmid.de/optimizing-vision-transformer Learn how to optimize Vision Transformer (ViT) using Hugging Face Optimum. You will learn how dynamically quantize a ViT model for ONNX Runtime. Tue, 19 Jul 2022 00:00:00 GMT Optimizing Transformers for GPUs with Optimum https://www.philschmid.de/optimizing-transformers-with-optimum-gpu https://www.philschmid.de/optimizing-transformers-with-optimum-gpu Learn how to optimize Hugging Face Transformers models for NVIDIA GPUs using Optimum. You will learn how to optimize a DistilBERT for ONNX Runtime Wed, 13 Jul 2022 00:00:00 GMT Hugging Face Transformers and Habana Gaudi AWS DL1 Instances https://www.philschmid.de/habana-distributed-training https://www.philschmid.de/habana-distributed-training Learn how to learn how to fine-tune XLM-RoBERTa for multi-lingual multi-class text-classification using a Habana Gaudi-based DL1 instance. Tue, 05 Jul 2022 00:00:00 GMT Optimizing Transformers with Hugging Face Optimum https://www.philschmid.de/optimizing-transformers-with-optimum https://www.philschmid.de/optimizing-transformers-with-optimum Learn how to optimize Hugging Face Transformers models using Optimum. The session will show you how to dynamically quantize and optimize a DistilBERT model using Hugging Face Optimum and ONNX Runtime. Hugging Face Optimum is an extension of 🤗 Transformers, providing a set of performance optimization tools enabling maximum efficiency to train and run models on targeted hardware. Thu, 30 Jun 2022 00:00:00 GMT Convert Transformers to ONNX with Hugging Face Optimum https://www.philschmid.de/convert-transformers-to-onnx https://www.philschmid.de/convert-transformers-to-onnx Introduction guide about ONNX and Transformers. Learn how to convert transformers like BERT to ONNX and what you can do with it. Tue, 21 Jun 2022 00:00:00 GMT Setup Deep Learning environment for Hugging Face Transformers with Habana Gaudi on AWS https://www.philschmid.de/getting-started-habana-gaudi https://www.philschmid.de/getting-started-habana-gaudi Learn how to setup a Deep Learning Environment for Hugging Face Transformers with Habana Gaudi on AWS using the DL1 instance type. Tue, 14 Jun 2022 00:00:00 GMT Static Quantization with Hugging Face `optimum` for ~3x latency improvements https://www.philschmid.de/static-quantization-optimum https://www.philschmid.de/static-quantization-optimum Learn how to do post-training static quantization on Hugging Face Transformers model with `optimum` to achieve up to 3x latency improvements. Tue, 07 Jun 2022 00:00:00 GMT Advanced PII detection and anonymization with Hugging Face Transformers and Amazon SageMaker https://www.philschmid.de/pii-huggingface-sagemaker https://www.philschmid.de/pii-huggingface-sagemaker Learn how to do advanced PII detection and anonymization with Hugging Face Transformers and Amazon SageMaker. Tue, 31 May 2022 00:00:00 GMT An Amazon SageMaker Inference comparison with Hugging Face Transformers https://www.philschmid.de/sagemaker-inference-comparison https://www.philschmid.de/sagemaker-inference-comparison Learn about the different existing Amazon SageMaker Inference options and and how to use them. Tue, 17 May 2022 00:00:00 GMT Semantic Segmantion with Hugging Face's Transformers and Amazon SageMaker https://www.philschmid.de/image-segmentation-sagemaker https://www.philschmid.de/image-segmentation-sagemaker Learn how to do image segmentation with Hugging Face Transformers, SegFormer and Amazon SageMaker. Tue, 03 May 2022 00:00:00 GMT Automatic Speech Recogntion with Hugging Face's Transformers and Amazon SageMaker https://www.philschmid.de/automatic-speech-recognition-sagemaker https://www.philschmid.de/automatic-speech-recognition-sagemaker Learn how to do automatic speech recognition/speech-to-text with Hugging Face Transformers, Wav2vec2 and Amazon SageMaker. Thu, 28 Apr 2022 00:00:00 GMT Serverless Inference with Hugging Face's Transformers, DistilBERT and Amazon SageMaker https://www.philschmid.de/sagemaker-serverless-huggingface-distilbert https://www.philschmid.de/sagemaker-serverless-huggingface-distilbert Learn how to deploy a Transformer model like BERT to Amazon SageMaker Serverless using the Python SageMaker SDK. Thu, 21 Apr 2022 00:00:00 GMT Accelerated document embeddings with Hugging Face Transformers and AWS Inferentia https://www.philschmid.de/huggingface-sentence-transformers-aws-inferentia https://www.philschmid.de/huggingface-sentence-transformers-aws-inferentia Learn how to accelerate Sentence Transformers inference inference using Hugging Face Transformers and AWS Inferentia. Tue, 19 Apr 2022 00:00:00 GMT Save up to 90% training cost with AWS Spot Instances and Hugging Face Transformers https://www.philschmid.de/sagemaker-spot-instance https://www.philschmid.de/sagemaker-spot-instance Learn how to leverage AWS Spot Instances when training Hugging Face Transformers with Amazon SageMaker to save up to 90% training cost. Tue, 22 Mar 2022 00:00:00 GMT Speed up BERT inference with Hugging Face Transformers and AWS Inferentia https://www.philschmid.de/huggingface-bert-aws-inferentia https://www.philschmid.de/huggingface-bert-aws-inferentia Learn how to accelerate BERT and Transformers inference using Hugging Face Transformers and AWS Inferentia. Wed, 16 Mar 2022 00:00:00 GMT Creating document embeddings with Hugging Face's Transformers and Amazon SageMaker https://www.philschmid.de/custom-inference-huggingface-sagemaker https://www.philschmid.de/custom-inference-huggingface-sagemaker Learn how to use a custom Inference script for creating document embeddings with Hugging Face's Transformers, Amazon SageMaker, and Sentence Transformers. Tue, 08 Mar 2022 00:00:00 GMT Autoscaling BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module https://www.philschmid.de/terraform-huggingface-amazon-sagemaker-advanced https://www.philschmid.de/terraform-huggingface-amazon-sagemaker-advanced Learn how to apply autoscaling to Hugging Face Transformers and Amazon SageMaker using Terraform. Tue, 01 Mar 2022 00:00:00 GMT Multi-Container Endpoints with Hugging Face Transformers and Amazon SageMaker https://www.philschmid.de/sagemaker-huggingface-multi-container-endpoint https://www.philschmid.de/sagemaker-huggingface-multi-container-endpoint Learn how to deploy multiple Hugging Face Transformers for inference with Amazon SageMaker and Multi-Container Endpoints. Tue, 22 Feb 2022 00:00:00 GMT Asynchronous Inference with Hugging Face Transformers and Amazon SageMaker https://www.philschmid.de/sagemaker-huggingface-async-inference https://www.philschmid.de/sagemaker-huggingface-async-inference Learn how to deploy an Asynchronous Inference model with Hugging Face Transformers and Amazon SageMaker, with autoscaling to zero. Tue, 15 Feb 2022 00:00:00 GMT Deploy BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module https://www.philschmid.de/terraform-huggingface-amazon-sagemaker https://www.philschmid.de/terraform-huggingface-amazon-sagemaker Learn how to deploy BERT/DistilBERT with Hugging Face Transformers using Amazon SageMaker and Terraform module. Tue, 08 Feb 2022 00:00:00 GMT Task-specific knowledge distillation for BERT using Transformers and Amazon SageMaker https://www.philschmid.de/knowledge-distillation-bert-transformers https://www.philschmid.de/knowledge-distillation-bert-transformers Learn how to run apply task-specific knowledge distillation for BERT and text-classification using Hugging Face Transformers and Amazon SageMaker including Hyperparameter search. Tue, 01 Feb 2022 00:00:00 GMT Distributed training on multilingual BERT with Hugging Face Transformers and Amazon SageMaker https://www.philschmid.de/pytorch-distributed-training-transformers https://www.philschmid.de/pytorch-distributed-training-transformers Learn how to run large-scale distributed training using multilingual BERT on over 1 million data points with Hugging Face Transformers and Amazon SageMaker Tue, 25 Jan 2022 00:00:00 GMT Financial Text Summarization with Hugging Face Transformers, Keras and Amazon SageMaker https://www.philschmid.de/financial-summarizatio-huggingface-keras https://www.philschmid.de/financial-summarizatio-huggingface-keras Learn how to fine-tune a a Hugging Face Transformer for Financial Text Summarization using vanilla `Keras`, `Tensorflow` , `Transformers`, `Datasets` and Amazon SageMaker. Wed, 19 Jan 2022 00:00:00 GMT Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker https://www.philschmid.de/deploy-gptj-sagemaker https://www.philschmid.de/deploy-gptj-sagemaker Learn how to deploy EleutherAIs GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker. Tue, 11 Jan 2022 00:00:00 GMT Image Classification with Hugging Face Transformers and `Keras` https://www.philschmid.de/image-classification-huggingface-transformers-keras https://www.philschmid.de/image-classification-huggingface-transformers-keras Learn how to fine-tune a Vision Transformer for Image Classification Example using vanilla `Keras`, `Transformers`, `Datasets`. Tue, 04 Jan 2022 00:00:00 GMT Workshop: Enterprise-Scale NLP with Hugging Face and Amazon SageMaker https://www.philschmid.de/hugginface-sagemaker-workshop https://www.philschmid.de/hugginface-sagemaker-workshop In October and November, we held a workshop series on “Enterprise-Scale NLP with Hugging Face and Amazon SageMaker”. This workshop series consisted out of 3 parts and covers: Getting Started, Going Production and MLOps. Wed, 29 Dec 2021 00:00:00 GMT Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition https://www.philschmid.de/huggingface-transformers-keras-tf https://www.philschmid.de/huggingface-transformers-keras-tf Learn how to fine-tune a non-English BERT using Hugging Face Transformers and Keras/TF, Transformers, datasets. Tue, 21 Dec 2021 00:00:00 GMT New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face https://www.philschmid.de/serverless-transformers-sagemaker-huggingface https://www.philschmid.de/serverless-transformers-sagemaker-huggingface Learn how to deploy Hugging Face Transformers serverless using the new Amazon SageMaker Serverless Inference. Wed, 15 Dec 2021 00:00:00 GMT Hugging Face Transformers BERT fine-tuning using Amazon SageMaker and Training Compiler https://www.philschmid.de/huggingface-amazon-sagemaker-training-compiler https://www.philschmid.de/huggingface-amazon-sagemaker-training-compiler Learn how to Compile and fine-tune a Multi-Class Classification Transformers with `Trainer` and `emotion` dataset using Amazon SageMaker Training Compiler. Tue, 07 Dec 2021 00:00:00 GMT MLOps: Using the Hugging Face Hub as model registry with Amazon SageMaker https://www.philschmid.de/huggingface-hub-amazon-sagemaker https://www.philschmid.de/huggingface-hub-amazon-sagemaker Learn how to automatically save your model weights, logs, and artifacts to the Hugging Face Hub using Amazon SageMaker and how to deploy the model afterwards for inference. Tue, 16 Nov 2021 00:00:00 GMT A remote guide to re:Invent 2021 machine learning sessions https://www.philschmid.de/re-invent-2021 https://www.philschmid.de/re-invent-2021 If you are like me you are not from the USA and cannot easily travel to Las Vegas. I have the perfect remote guide for your perfect virtual re:Invent 2021 focused on NLP and Machine Learning. Thu, 11 Nov 2021 00:00:00 GMT MLOps: End-to-End Hugging Face Transformers with the Hub and SageMaker Pipelines https://www.philschmid.de/mlops-sagemaker-huggingface-transformers https://www.philschmid.de/mlops-sagemaker-huggingface-transformers Learn how to build an End-to-End MLOps Pipeline for Hugging Face Transformers from training to production using Amazon SageMaker. Wed, 10 Nov 2021 00:00:00 GMT Going Production: Auto-scaling Hugging Face Transformers with Amazon SageMaker https://www.philschmid.de/auto-scaling-sagemaker-huggingface https://www.philschmid.de/auto-scaling-sagemaker-huggingface Learn how to add auto-scaling to your Hugging Face Transformers SageMaker Endpoints. Fri, 29 Oct 2021 00:00:00 GMT Deploy BigScience T0_3B to AWS and Amazon SageMaker https://www.philschmid.de/deploy-bigscience-t0-3b-to-aws-and-amazon-sagemaker https://www.philschmid.de/deploy-bigscience-t0-3b-to-aws-and-amazon-sagemaker 🌸 BigScience released their first modeling paper introducing T0 which outperforms GPT-3 on many zero-shot tasks while being 16x smaller! Deploy BigScience the 3 Billion version (T0_3B) to Amazon SageMaker with a few lines of code to run a scalable production workload! Wed, 20 Oct 2021 00:00:00 GMT Scalable, Secure Hugging Face Transformer Endpoints with Amazon SageMaker, AWS Lambda, and CDK https://www.philschmid.de/huggingface-transformers-cdk-sagemaker-lambda https://www.philschmid.de/huggingface-transformers-cdk-sagemaker-lambda Deploy Hugging Face Transformers to Amazon SageMaker and create an API for the Endpoint using AWS Lambda, API Gateway and AWS CDK. Wed, 06 Oct 2021 00:00:00 GMT Few-shot learning in practice with GPT-Neo https://www.philschmid.de/few-shot-learning-gpt-neo https://www.philschmid.de/few-shot-learning-gpt-neo The latest developments in NLP show that you can overcome this limitation by providing a few examples at inference time with a large language model - a technique known as Few-Shot Learning. In this blog post, we'll explain what Few-Shot Learning is, and explore how a large language model called GPT-Neo. Sat, 05 Jun 2021 00:00:00 GMT Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker https://www.philschmid.de/sagemaker-distributed-training https://www.philschmid.de/sagemaker-distributed-training Learn how to train distributed models for summarization using Hugging Face Transformers and Amazon SageMaker and upload them afterwards to huggingface.co. Fri, 09 Apr 2021 00:00:00 GMT Multilingual Serverless XLM RoBERTa with HuggingFace, AWS Lambda https://www.philschmid.de/multilingual-serverless-xlm-roberta-with-huggingface https://www.philschmid.de/multilingual-serverless-xlm-roberta-with-huggingface Learn how to build a Multilingual Serverless BERT Question Answering API with a model size of more than 2GB and then testing it in German and France. Thu, 17 Dec 2020 00:00:00 GMT Serverless BERT with HuggingFace, AWS Lambda, and Docker https://www.philschmid.de/serverless-bert-with-huggingface-aws-lambda-docker https://www.philschmid.de/serverless-bert-with-huggingface-aws-lambda-docker Learn how to use the newest cutting edge computing power of AWS with the benefits of serverless architectures to leverage Google's "State-of-the-Art" NLP Model. Sun, 06 Dec 2020 00:00:00 GMT AWS Lambda with custom docker images as runtime https://www.philschmid.de/aws-lambda-with-custom-docker-image https://www.philschmid.de/aws-lambda-with-custom-docker-image Learn how to build and deploy an AWS Lambda function with a custom python docker container as runtime with the use of Amazon ECR. Wed, 02 Dec 2020 00:00:00 GMT New Serverless BERT with Huggingface, AWS Lambda, and AWS EFS https://www.philschmid.de/new-serverless-bert-with-huggingface-aws-lambda https://www.philschmid.de/new-serverless-bert-with-huggingface-aws-lambda Build a serverless Question-Answering API using the Serverless Framework, AWS Lambda, AWS EFS, efsync, Terraform, the transformers Library from HuggingFace, and a `mobileBert` model from Google fine-tuned on SQuADv2. Sun, 15 Nov 2020 00:00:00 GMT efsync my first open-source MLOps toolkit https://www.philschmid.de/efsync-my-first-open-source-mlops-toolkit https://www.philschmid.de/efsync-my-first-open-source-mlops-toolkit efsync is a CLI/SDK tool, which syncs files from S3 or local filesystem automatically to AWS EFS and enables you to install dependencies with the AWS Lambda runtime directly into your EFS filesystem. Wed, 04 Nov 2020 00:00:00 GMT My path to become a certified solution architect https://www.philschmid.de/my-path-to-become-a-certified-solution-architect https://www.philschmid.de/my-path-to-become-a-certified-solution-architect This is the Story of how I became a certified solution architect within 28 hours of preparation. Sat, 24 Oct 2020 00:00:00 GMT Create custom Github Action in 4 steps https://www.philschmid.de/create-custom-github-action-in-4-steps https://www.philschmid.de/create-custom-github-action-in-4-steps Create a custom github action in 4 steps. Also learn how to test it offline and publish it in the Github Action marketplace. Fri, 25 Sep 2020 00:00:00 GMT Fine-tune a non-English GPT-2 Model with Huggingface https://www.philschmid.de/fine-tune-a-non-english-gpt-2-model-with-huggingface https://www.philschmid.de/fine-tune-a-non-english-gpt-2-model-with-huggingface Fine-tune non-English, German GPT-2 model with Huggingface on German recipes. Using their Trainer class and Pipeline objects. Sun, 06 Sep 2020 00:00:00 GMT Mount your AWS EFS volume into AWS Lambda with the Serverless Framework https://www.philschmid.de/mount-your-aws-efs-volume-into-aws-lambda-with-the-serverless-framework https://www.philschmid.de/mount-your-aws-efs-volume-into-aws-lambda-with-the-serverless-framework Leverage your Serverless architectures with mounting your AWS EFS volume into your AWS Lambda with the Serverless Framework. Wed, 12 Aug 2020 00:00:00 GMT Serverless BERT with HuggingFace and AWS Lambda https://www.philschmid.de/serverless-bert-with-huggingface-and-aws-lambda https://www.philschmid.de/serverless-bert-with-huggingface-and-aws-lambda Build a serverless question-answering API with BERT, HuggingFace, the Serverless Framework and AWS Lambda. Tue, 30 Jun 2020 00:00:00 GMT How to use Google Tag Manager and Google Analytics without Cookies https://www.philschmid.de/how-to-use-google-tag-manager-and-google-analytics-without-cookies https://www.philschmid.de/how-to-use-google-tag-manager-and-google-analytics-without-cookies Connect your user behavior with technical insights without using cookies to improve your customer experience. Sat, 06 Jun 2020 00:00:00 GMT BERT Text Classification in a different language https://www.philschmid.de/bert-text-classification-in-a-different-language https://www.philschmid.de/bert-text-classification-in-a-different-language Build a non-English (German) BERT multi-class text classification model with HuggingFace and Simple Transformers. Fri, 22 May 2020 00:00:00 GMT Scaling Machine Learning from ZERO to HERO https://www.philschmid.de/scaling-machine-learning-from-zero-to-hero https://www.philschmid.de/scaling-machine-learning-from-zero-to-hero Scale your machine learning models by using AWS Lambda, the Serverless Framework, and PyTorch. I will show you how to build scalable deep learning inference architectures. Fri, 08 May 2020 00:00:00 GMT Getting Started with AutoML and AWS AutoGluon https://www.philschmid.de/getting-started-with-automl-and-aws-autogluon https://www.philschmid.de/getting-started-with-automl-and-aws-autogluon Built an Object Detection Model with AWS AutoML library AutoGluon. Mon, 20 Apr 2020 00:00:00 GMT K-Fold as Cross-Validation with a BERT Text-Classification Example https://www.philschmid.de/k-fold-as-cross-validation-with-a-bert-text-classification-example https://www.philschmid.de/k-fold-as-cross-validation-with-a-bert-text-classification-example Using the K-Fold Cross-Validation to improve your Transformers model validation by the example of BERT Text-Classification Tue, 07 Apr 2020 00:00:00 GMT How to Set Up a CI/CD Pipeline for AWS Lambda With GitHub Actions and Serverless https://www.philschmid.de/how-to-set-up-a-ci-cd-pipeline-for-aws-lambda-with-github-actions-and-serverless https://www.philschmid.de/how-to-set-up-a-ci-cd-pipeline-for-aws-lambda-with-github-actions-and-serverless Automatically deploy your Python function with dependencies in less than five minutes Wed, 01 Apr 2020 00:00:00 GMT Set up a CI/CD Pipeline for your Web app on AWS with Github Actions https://www.philschmid.de/set-up-a-ci-cd-pipeline-for-your-web-app-on-aws-s3-with-github-actions https://www.philschmid.de/set-up-a-ci-cd-pipeline-for-your-web-app-on-aws-s3-with-github-actions Automatic deploy your React, Vue, Angular or Svelte app on s3 and create a cache Invalidation with Github Actions. Wed, 25 Mar 2020 00:00:00 GMT Getting started with CNNs by calculating LeNet-Layer manually https://www.philschmid.de/getting-started-with-cnn-by-calculating-lenet-layer-manually https://www.philschmid.de/getting-started-with-cnn-by-calculating-lenet-layer-manually Getting started explanation to CNNs by calculating Yann LeCun LeNet-5 manually for handwritten digits and learning about Padding and Stride. Fri, 28 Feb 2020 00:00:00 GMT Google Colab the free GPU/TPU Jupyter Notebook Service https://www.philschmid.de/google-cola-the-free-gpu-jupyter https://www.philschmid.de/google-cola-the-free-gpu-jupyter A Short Introduction to Google Colab as a free Jupyter notebook service from Google. Learn how to use Accelerated Hardware like GPUs and TPUs to run your Machine learning completely for free in the cloud. Wed, 26 Feb 2020 00:00:00 GMT