Writing a Good AGENTS.md

February 24, 20264 minute read

An AGENTS.md (or GEMINI.md) file is the single highest-leverage configuration point for coding agents. It's injected into every conversation, acting as the agent's onboarding document to your codebase. But research shows that doing it wrong actively hurts performance. Here's how to do it right, backed by data from "Evaluating AGENTS.md" (ETH Zurich, 2025) and practical experience from HumanLayer.

The Data Says: Less Is More

Auto-generated AGENTS.md files reduce task success rates by ~3% on average across multiple agents and models, while increasing inference cost by over 20% (§4.2).
Human-written AGENTS.md files only marginally improve performance (~4%), and still increase cost by up to 19% due to extra steps the agent takes.
Stronger models don't generate better context files. GPT-5.2-generated files improved performance on one benchmark by 2% but degraded it on another by 3% (§4.4).
Codebase overviews in AGENTS.md don't help agents navigate faster. Agents found relevant files in roughly the same number of steps with or without an overview section (§4.2).
LLM-generated files are redundant with existing docs. When all other documentation was removed, LLM-generated files actually improved performance by 2.7%—they only hurt when they duplicate what's already discoverable (§4.2).
Instructions ARE followed—that's the problem. Agents respect AGENTS.md instructions, but unnecessary requirements make tasks harder, increasing reasoning tokens by 14–22% (§4.3).

What to Include

The WHAT: Your tech stack, project structure, and what each part does. Critical for monorepos—tell the agent what the apps, shared packages, and services are.
The WHY: The purpose of the project and its key components. Help the agent understand intent, not just structure.
The HOW: How to build, test, and verify changes. Include non-obvious tooling (e.g., uv instead of pip, bun instead of npm). Tools mentioned in AGENTS.md get used 160x more often than unmentioned ones (§4.3).

What NOT to Include

Detailed codebase overviews or directory listings. The paper found these don't help agents navigate faster, and agents can discover structure themselves.
Code style guidelines. Use linters and formatters instead—they're faster, cheaper, and deterministic. LLMs are in-context learners and will follow existing patterns in your code.
Task-specific instructions that only apply sometimes. Since AGENTS.md goes into every session, non-universal instructions dilute focus. Frontier models can reliably follow ~150–200 instructions; the agent harness already uses ~50 of those.
Auto-generated content. Don't use /init or let the agent write its own AGENTS.md. The data shows this hurts more than it helps.

How to Structure It

Keep it short. General consensus is <300 lines; HumanLayer keeps theirs under 60 lines. Every line goes into every session—make each one count.
Use progressive disclosure. Don't put everything in AGENTS.md. Instead, keep task-specific docs in separate files (e.g., agent_docs/running_tests.md, agent_docs/database_schema.md) and list them in AGENTS.md with brief descriptions so the agent reads them only when relevant.
Prefer pointers over copies. Reference file:line locations rather than embedding code snippets that will go stale.
Write it yourself, deliberately. A bad line in AGENTS.md cascades into bad plans, bad code, and bad results across every session. Treat it like infrastructure, not a scratchpad.

Thanks for reading! If you have any questions or feedback, please let me know on Twitter or LinkedIn.