OpenAI Codex CLI, how does it work?

April 17, 202511 minute read

OpenAI Codex is a open source CLI released with OpenAI o3/o4-mini to be a "chat-driven development" tool. It allows developers to use AI models via API directly in their terminal to perform coding tasks. Unlike a simple chatbot, it can read files, write files (via patches), execute shell commands (often sandboxed), and iterate based on the results and user feedback.

Note: This overview was generated with Gemini 2.5 Pro and updated collaboratively iterated on with Gemini 2.5 Pro and myself.

Core Components & Workflow

User Interface (UI)

Agent Loop

  • The core logic resides in src/utils/agent/agent-loop.ts.
  • The AgentLoop class manages the interaction cycle with the OpenAI API.
  • It takes the user's input, combines it with conversation history and instructions, and sends it to the model.
  • It uses the openai Node.js library (v4+) and specifically calls openai.responses.create, indicating use of the /responses endpoint which supports streaming and tool use.

Model Interaction

  • The AgentLoop sends the context (history, instructions, user input) to the specified model (default o4-mini, configurable via --model or config file).
  • It requests a streaming response.
  • It handles different response item types (message, function_call, function_call_output, reasoning).
  • src/utils/model-utils.ts handles fetching available models and checking compatibility.

Tools & Execution

  • The primary "tool" defined is shell (or container.exec), allowing the model to request shell command execution. See the tools array in src/utils/agent/agent-loop.ts.

Command Execution

Sandboxing

  • The execution logic in handleExecCommand decides how to run the command based on the approval policy and safety assessment.
  • full-auto mode implies sandboxing.
  • src/utils/agent/sandbox/ contains the sandboxing implementations:
    • macos-seatbelt.ts: Uses macOS's sandbox-exec to restrict file system access and block network calls (READ_ONLY_SEATBELT_POLICY). Writable paths are whitelisted.
    • raw-exec.ts: Executes commands directly without sandboxing (used when sandboxing isn't needed or available).
    • Linux: The README.md, Dockerfile, and scripts/ indicate a Docker-based approach. The CLI runs inside a minimal container where scripts/init_firewall.sh uses iptables/ipset to restrict network access only to the OpenAI API. The user's project directory is mounted into the container.

File Patching (apply_patch)

Prompts & Context Awareness

  • System Prompt: A long, detailed system prompt is hardcoded as prefix within src/utils/agent/agent-loop.ts. This tells the model about its role as the Codex CLI, its capabilities (shell, patching), constraints (sandboxing), and coding guidelines.
  • User Instructions: Instructions are gathered from both global (~/.codex/instructions.md) and project-specific (codex.md or similar, discovered via logic in src/utils/config.ts) files. These combined instructions are prepended to the conversation history sent to the model.
  • Conversation History: The items array (containing ResponseItem objects like user messages, assistant messages, tool calls, tool outputs) is passed back to the model on each turn, providing conversational context. src/utils/approximate-tokens-used.ts estimates context window usage.
  • File Context (Standard Mode): The agent doesn't automatically read project files. It gains file context only when the model explicitly requests to read a file (e.g., via cat) or when file content appears in the output of a previous command (e.g., git diff).
  • File Context (Experimental --full-context Mode): This mode utilizes a distinct flow (see src/cli_singlepass.tsx, src/utils/singlepass/). It involves:
  • Configuration: Stores default model, approval mode settings, etc. Managed by src/utils/config.ts, loads from ~/.codex/config.yaml (or .yml/.json) (not in repo).

Step-by-Step Manual Walkthrough (Simulating the CLI)

Let's imagine the user runs: codex "Refactor utils.ts to use arrow functions" in a directory /home/user/myproject.

  1. Initialization (cli.tsx, app.tsx):

    • Parse arguments: Prompt is "Refactor...", model is default (o4-mini), approval mode is default (suggest).
    • Load config (loadConfig in src/utils/config.ts): Read ~/.codex/config.yaml and ~/.codex/instructions.md.
    • Discover and load project docs (loadProjectDoc in src/utils/config.ts): Find /home/user/myproject/codex.md and read its content.
    • Combine instructions: Merge user instructions and project docs.
    • Check Git status (checkInGit in src/utils/check-in-git.ts): Confirm /home/user/myproject is a Git repo.
    • Render the main UI (TerminalChat).
  2. First API Call (AgentLoop.run in src/utils/agent/agent-loop.ts):

    • Create initial input: [{ role: "user", content: [{ type: "input_text", text: "Refactor..." }] }].
    • Construct API request payload: Include system prompt (from prefix), combined instructions, and the user input message. Set model: "o4-mini", stream: true, tools: [...]. No previous_response_id.
    • Send request: Call openai.responses.create(...) (using the openai library). UI shows "Thinking...".
  3. Model Response (Stream):

    • Assume the model decides it needs to read the file first.
    • Stream event 1: response.output_item.done with item: { type: "function_call", name: "shell", arguments: '{"cmd": ["cat", "utils.ts"]}', call_id: "call_1" }
    • Stream event 2: response.completed with output: [...] containing the same function call, id: "resp_1".
    • Agent receives the function call. onLastResponseId is called with "resp_1".
  4. Tool Call Handling (handleExecCommand in src/utils/agent/handle-exec-command.ts):

    • Parse arguments: cmd = ["cat", "utils.ts"].
    • Check approval: canAutoApprove(["cat", "utils.ts"], "suggest", ["/home/user/myproject"]) (in src/approvals.ts) -> returns { type: "auto-approve", reason: "View file contents", group: "Reading files", runInSandbox: false }.
    • Execute command (execCommand in src/utils/agent/handle-exec-command.ts): Run cat utils.ts directly (no sandbox needed for safe commands). Note: Assuming utils.ts exists at the root for this example; in reality, the model might need to specify a path like src/utils.ts.
    • Simulate result: stdout = "/* content of utils.ts */", stderr = "", exitCode = 0.
  5. Second API Call (AgentLoop.run continues):

    • Format tool result: Create a function_call_output item like { type: "function_call_output", call_id: "call_1", output: '{"output": "/* content ... */", "metadata": {"exit_code": 0, ...}}' }.
    • Construct API request payload: Include system prompt, combined instructions, the entire history so far (user message, assistant function call request, function call output), set previous_response_id: "resp_1".
    • Send request. UI shows "Thinking...".
  6. Model Response (Stream):

    • Assume model generates the refactored code and decides to apply it.
    • Stream event 1: response.output_item.done with item: { type: "function_call", name: "shell", arguments: '{"cmd": ["apply_patch", "*** Begin Patch\n*** Update File: utils.ts\n@@ ... -old +new ...\n*** End Patch"]}', call_id: "call_2" }.
    • Stream event 2: response.completed with output: [...] containing the patch function call, id: "resp_2".
    • Agent receives the patch function call. onLastResponseId is called with "resp_2".
  7. Tool Call Handling (Patch):

    • Parse arguments: Identify apply_patch and extract the patch text.
    • Check approval: canAutoApprove(["apply_patch", "..."], "suggest", ["/home/user/myproject"]). Since policy is suggest, this returns { type: "ask-user", applyPatch: { patch: "..." } }.
    • Request confirmation (requestConfirmation): The UI (TerminalChatCommandReview) displays the patch diff and asks "Allow command? [y/N/e/a]".
    • User reviews and presses 'y'. submitConfirmation is called with { decision: ReviewDecision.YES }.
    • Execute patch (execApplyPatch in src/utils/agent/exec.ts -> process_patch in src/utils/agent/apply-patch.ts): Reads utils.ts, applies the diff logic, and writes the modified content back using Node.js fs.writeFileSync.
    • Simulate result: stdout = "Done!", stderr = "", exitCode = 0.
  8. Third API Call:

    • Format tool result: Create function_call_output item for the patch, { call_id: "call_2", output: '{"output": "Done!", ...}' }.
    • Construct API request: Include history + patch result, previous_response_id: "resp_2".
    • Send request.
  9. Model Response (Final):

    • Assume model confirms the refactoring is done.
    • Stream event 1: response.output_item.done with item: { type: "message", role: "assistant", content: [{ type: "output_text", text: "OK, I've refactored utils.ts to use arrow functions." }] }.
    • Stream event 2: response.completed, id: "resp_3".
    • Agent receives the message. onLastResponseId called with "resp_3".
    • No more tool calls. The loop finishes for this turn. UI stops showing "Thinking...".
  10. User Interaction:

    • The user sees the final message and the updated prompt, ready for the next command. The file utils.ts on their disk has been modified.