OpenAI Codex CLI, how does it work?
OpenAI Codex is a open source CLI released with OpenAI o3/o4-mini to be a "chat-driven development" tool. It allows developers to use AI models via API directly in their terminal to perform coding tasks. Unlike a simple chatbot, it can read files, write files (via patches), execute shell commands (often sandboxed), and iterate based on the results and user feedback.
Note: This overview was generated with Gemini 2.5 Pro and updated collaboratively iterated on with Gemini 2.5 Pro and myself.
Core Components & Workflow
User Interface (UI)
- The interactive terminal UI is built using
ink
andreact
, offering a richer experience than plain text. Key components reside insrc/components/
, particularly withinsrc/components/chat/
. - The application entry point is
src/cli.tsx
(usingmeow
for argument parsing), which sets up the mainTerminalChat
component viasrc/app.tsx
. TerminalChat
manages the overall display including history, input prompts, loading states, and overlays.- User input is handled by
TerminalChatInput
(orTerminalChatNewInput
), supporting command history and slash commands. - The conversation history is displayed by
TerminalMessageHistory
, using components likeTerminalChatResponseItem
to render different message types.
Agent Loop
- The core logic resides in
src/utils/agent/agent-loop.ts
. - The
AgentLoop
class manages the interaction cycle with the OpenAI API. - It takes the user's input, combines it with conversation history and instructions, and sends it to the model.
- It uses the
openai
Node.js library (v4+) and specifically callsopenai.responses.create
, indicating use of the/responses
endpoint which supports streaming and tool use.
Model Interaction
- The
AgentLoop
sends the context (history, instructions, user input) to the specified model (defaulto4-mini
, configurable via--model
or config file). - It requests a streaming response.
- It handles different response item types (
message
,function_call
,function_call_output
,reasoning
). src/utils/model-utils.ts
handles fetching available models and checking compatibility.
Tools & Execution
- The primary "tool" defined is
shell
(orcontainer.exec
), allowing the model to request shell command execution. See thetools
array insrc/utils/agent/agent-loop.ts
.
Command Execution
- When the model emits a
function_call
forshell
, theAgentLoop
invokeshandleExecCommand
(src/utils/agent/handle-exec-command.ts
). - This function checks the approval policy (
suggest
,auto-edit
,full-auto
). src/approvals.ts
(canAutoApprove
) determines if the command is known-safe or needs user confirmation based on the policy.- If confirmation is needed, the UI (
TerminalChatCommandReview
) prompts the user. - If approved (or auto-approved), the command is executed via
src/utils/agent/exec.ts
.
Sandboxing
- The execution logic in
handleExecCommand
decides how to run the command based on the approval policy and safety assessment. full-auto
mode implies sandboxing.src/utils/agent/sandbox/
contains the sandboxing implementations:macos-seatbelt.ts
: Uses macOS'ssandbox-exec
to restrict file system access and block network calls (READ_ONLY_SEATBELT_POLICY
). Writable paths are whitelisted.raw-exec.ts
: Executes commands directly without sandboxing (used when sandboxing isn't needed or available).- Linux: The
README.md
,Dockerfile
, andscripts/
indicate a Docker-based approach. The CLI runs inside a minimal container wherescripts/init_firewall.sh
usesiptables
/ipset
to restrict network access only to the OpenAI API. The user's project directory is mounted into the container.
apply_patch
)
File Patching (- The model is instructed (via the system prompt in
src/utils/agent/agent-loop.ts
) to use a specific format like{"cmd":["apply_patch","*** Begin Patch..."]}
when it wants to edit files. handleExecCommand
detects this pattern.- Instead of running
apply_patch
as a shell command, it usesexecApplyPatch
(src/utils/agent/exec.ts
), which callsprocess_patch
fromsrc/utils/agent/apply-patch.ts
. src/utils/agent/apply-patch.ts
parses the patch format and uses Node.jsfs
calls to modify files directly on the host system (or within the container on Linux).parse-apply-patch.ts
(likely used by the UI) helps render diffs for user review.
Prompts & Context Awareness
- System Prompt: A long, detailed system prompt is hardcoded as
prefix
withinsrc/utils/agent/agent-loop.ts
. This tells the model about its role as the Codex CLI, its capabilities (shell, patching), constraints (sandboxing), and coding guidelines. - User Instructions: Instructions are gathered from both global (
~/.codex/instructions.md
) and project-specific (codex.md
or similar, discovered via logic insrc/utils/config.ts
) files. These combined instructions are prepended to the conversation history sent to the model. - Conversation History: The
items
array (containingResponseItem
objects like user messages, assistant messages, tool calls, tool outputs) is passed back to the model on each turn, providing conversational context.src/utils/approximate-tokens-used.ts
estimates context window usage. - File Context (Standard Mode): The agent doesn't automatically read project files. It gains file context only when the model explicitly requests to read a file (e.g., via
cat
) or when file content appears in the output of a previous command (e.g.,git diff
). - File Context (Experimental
--full-context
Mode): This mode utilizes a distinct flow (seesrc/cli_singlepass.tsx
,src/utils/singlepass/
). It involves:- Walking the directory, reading, and caching files via
src/utils/singlepass/context_files.ts
. - Formatting the prompt, directory structure, and file contents into a single large request using
src/utils/singlepass/context.ts
. - Expecting the model to return all file changes (creations, updates, deletes, moves) in a specific Zod schema defined in
src/utils/singlepass/file_ops.ts
.
- Walking the directory, reading, and caching files via
- Configuration: Stores default model, approval mode settings, etc. Managed by
src/utils/config.ts
, loads from~/.codex/config.yaml
(or.yml
/.json
) (not in repo).
Step-by-Step Manual Walkthrough (Simulating the CLI)
Let's imagine the user runs: codex "Refactor utils.ts to use arrow functions"
in a directory /home/user/myproject
.
-
Initialization (
cli.tsx
,app.tsx
):- Parse arguments: Prompt is "Refactor...", model is default (
o4-mini
), approval mode is default (suggest
). - Load config (
loadConfig
insrc/utils/config.ts
): Read~/.codex/config.yaml
and~/.codex/instructions.md
. - Discover and load project docs (
loadProjectDoc
insrc/utils/config.ts
): Find/home/user/myproject/codex.md
and read its content. - Combine instructions: Merge user instructions and project docs.
- Check Git status (
checkInGit
insrc/utils/check-in-git.ts
): Confirm/home/user/myproject
is a Git repo. - Render the main UI (
TerminalChat
).
- Parse arguments: Prompt is "Refactor...", model is default (
-
First API Call (
AgentLoop.run
insrc/utils/agent/agent-loop.ts
):- Create initial input:
[{ role: "user", content: [{ type: "input_text", text: "Refactor..." }] }]
. - Construct API request payload: Include system prompt (from
prefix
), combined instructions, and the user input message. Setmodel: "o4-mini"
,stream: true
,tools: [...]
. Noprevious_response_id
. - Send request: Call
openai.responses.create(...)
(using theopenai
library). UI shows "Thinking...".
- Create initial input:
-
Model Response (Stream):
- Assume the model decides it needs to read the file first.
- Stream event 1:
response.output_item.done
withitem: { type: "function_call", name: "shell", arguments: '{"cmd": ["cat", "utils.ts"]}', call_id: "call_1" }
- Stream event 2:
response.completed
withoutput: [...]
containing the same function call,id: "resp_1"
. - Agent receives the function call.
onLastResponseId
is called with"resp_1"
.
-
Tool Call Handling (
handleExecCommand
insrc/utils/agent/handle-exec-command.ts
):- Parse arguments:
cmd = ["cat", "utils.ts"]
. - Check approval:
canAutoApprove(["cat", "utils.ts"], "suggest", ["/home/user/myproject"])
(insrc/approvals.ts
) -> returns{ type: "auto-approve", reason: "View file contents", group: "Reading files", runInSandbox: false }
. - Execute command (
execCommand
insrc/utils/agent/handle-exec-command.ts
): Runcat utils.ts
directly (no sandbox needed for safe commands). Note: Assumingutils.ts
exists at the root for this example; in reality, the model might need to specify a path likesrc/utils.ts
. - Simulate result:
stdout = "/* content of utils.ts */", stderr = "", exitCode = 0
.
- Parse arguments:
-
Second API Call (
AgentLoop.run
continues):- Format tool result: Create a
function_call_output
item like{ type: "function_call_output", call_id: "call_1", output: '{"output": "/* content ... */", "metadata": {"exit_code": 0, ...}}' }
. - Construct API request payload: Include system prompt, combined instructions, the entire history so far (user message, assistant function call request, function call output), set
previous_response_id: "resp_1"
. - Send request. UI shows "Thinking...".
- Format tool result: Create a
-
Model Response (Stream):
- Assume model generates the refactored code and decides to apply it.
- Stream event 1:
response.output_item.done
withitem: { type: "function_call", name: "shell", arguments: '{"cmd": ["apply_patch", "*** Begin Patch\n*** Update File: utils.ts\n@@ ... -old +new ...\n*** End Patch"]}', call_id: "call_2" }
. - Stream event 2:
response.completed
withoutput: [...]
containing the patch function call,id: "resp_2"
. - Agent receives the patch function call.
onLastResponseId
is called with"resp_2"
.
-
Tool Call Handling (Patch):
- Parse arguments: Identify
apply_patch
and extract the patch text. - Check approval:
canAutoApprove(["apply_patch", "..."], "suggest", ["/home/user/myproject"])
. Since policy issuggest
, this returns{ type: "ask-user", applyPatch: { patch: "..." } }
. - Request confirmation (
requestConfirmation
): The UI (TerminalChatCommandReview
) displays the patch diff and asks "Allow command? [y/N/e/a]". - User reviews and presses 'y'.
submitConfirmation
is called with{ decision: ReviewDecision.YES }
. - Execute patch (
execApplyPatch
insrc/utils/agent/exec.ts
->process_patch
insrc/utils/agent/apply-patch.ts
): Readsutils.ts
, applies the diff logic, and writes the modified content back using Node.jsfs.writeFileSync
. - Simulate result:
stdout = "Done!", stderr = "", exitCode = 0
.
- Parse arguments: Identify
-
Third API Call:
- Format tool result: Create
function_call_output
item for the patch,{ call_id: "call_2", output: '{"output": "Done!", ...}' }
. - Construct API request: Include history + patch result,
previous_response_id: "resp_2"
. - Send request.
- Format tool result: Create
-
Model Response (Final):
- Assume model confirms the refactoring is done.
- Stream event 1:
response.output_item.done
withitem: { type: "message", role: "assistant", content: [{ type: "output_text", text: "OK, I've refactored utils.ts to use arrow functions." }] }
. - Stream event 2:
response.completed
,id: "resp_3"
. - Agent receives the message.
onLastResponseId
called with"resp_3"
. - No more tool calls. The loop finishes for this turn. UI stops showing "Thinking...".
-
User Interaction:
- The user sees the final message and the updated prompt, ready for the next command. The file
utils.ts
on their disk has been modified.
- The user sees the final message and the updated prompt, ready for the next command. The file