Gemini Interactions API Quick Start
The Interactions API is a unified interface for building with Gemini models and agents. It simplifies the development of agentic applications by handling server-side state management, tool orchestration, and long-running tasks.
With a single endpoint, you can:
- Interact with Gemini models for text, image, and audio generation
- Build multi-turn conversations without managing history client-side
- Call custom functions and built-in tools like Google Search
- Run specialized agents like Deep Research for complex tasks
Prerequisites
Get a Gemini API key and install the Google GenAI SDK:
pip install google-genaiSet your API key:
export GEMINI_API_KEY="your-api-key"Create an interaction
At its simplest, the Interactions API works like a standard chat completion. You provide a model and an input string. By default, interactions are stored (store=True), allowing you to reference them later.
from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="gemini-3-flash-preview",
system_instruction="You are a helpful assistant.",
input="Explain quantum entanglement in one sentence."
)
print(interaction.outputs[-1].text)
# Output: Quantum entanglement is a phenomenon where particles become linked...Stateful Multi-turn Conversations
One of the most powerful features of this API is server-side state management. You do not need to append messages to a list and send the full history back to the server every time.
Use previous_interaction_id to continue a conversation.
Note: Only conversation history is preserved. Parameters like tools or generation_config are interaction-scoped and must be re-declared if needed.
from google import genai
client = genai.Client()
turn_1 = client.interactions.create(
model="gemini-3-flash-preview",
input="My name is Alice and I am a software engineer."
)
print(f"Turn 1 ID: {turn_1.id}")
turn_2 = client.interactions.create(
model="gemini-3-flash-preview",
input="What is my job?",
previous_interaction_id=turn_1.id
)
print(turn_2.outputs[-1].text)
# Output: You are a software engineer.For client-managed history, see Stateless conversations.
Forking Conversations
Because state is managed by ID, you can "fork" a conversation by referencing an older interaction ID with a different prompt.
# Branch off from Turn 1 with a different topic
turn_2_fork = client.interactions.create(
model="gemini-3-flash-preview",
input="What is my name?",
previous_interaction_id=turn_1.id
)
print(turn_2_fork.outputs[-1].text)
# Output: Your name is Alice.Multimodal Interactions
Gemini models natively understand and generate multiple content types. You can pass text, images, audio, or PDF documents in a single interaction. This example uses a remote image URL.
Multimodal understanding
from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "text", "text": "Generate a recipe for the shown scones."},
{
"type": "image",
"uri": "https://storage.googleapis.com/generativeai-downloads/images/scones.jpg"
}
],
)
print(interaction.outputs[-1].text)For audio, video, and document (PDF) understanding, see Multimodal understanding.
Multimodal Generation
import base64
from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="gemini-3-pro-image-preview",
input="Generate an image of a futuristic city at sunset."
)
for output in interaction.outputs:
if output.type == "image":
with open("city.png", "wb") as f:
f.write(base64.b64decode(output.data))For audio generation see Multimodal generations.
Tool use
Tools extend the model's capabilities by letting it call external functions or services. The API includes ready-to-use tools like Google Search, and lets you define custom tools as JSON schemas. The model decides when to call them based on the conversation:
Built-in tools
from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input="Who won the 2024 Nobel Prize in Physics?",
tools=[{"type": "google_search"}]
)
text_output = next((o for o in interaction.outputs if o.type == "text"), None)
if text_output:
print(text_output.text)Other built-in tools include Code Execution and Computer Use.
Function calling
from google import genai
client = genai.Client()
# Define a tool
weather_tool = {
"type": "function",
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
# Send request with tool
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input="What's the weather in Tokyo?",
tools=[weather_tool]
)
# Handle tool call
for output in interaction.outputs:
if output.type == "function_call":
# Execute your function (mocked here)
# result = get_weather(output.arguments
result = {"temperature": "22°C", "condition": "sunny"}
# Return result to model
interaction = client.interactions.create(
model="gemini-3-flash-preview",
previous_interaction_id=interaction.id,
input={
"type": "function_result",
"name": output.name,
"call_id": output.id,
"result": result
}
)
print(interaction.outputs[-1].text)For code execution, URL context, and MCP servers, see Agentic capabilities.
Agents & Long-Running Tasks
Beyond models, the Interactions API provides access to specialized agents. Deep Research executes multi-step research tasks, synthesizing information from multiple sources into comprehensive reports.
Agents run asynchronously with background=True. Poll the interaction status
to retrieve results:
import time
from google import genai
client = genai.Client()
agent_interaction = client.interactions.create(
agent="deep-research-pro-preview-12-2025", # Note: use 'agent', not 'model'
input="Research the history of the Google TPUs with a focus on 2025 specs.",
background=True
)
# Poll for completion
while True:
status_check = client.interactions.get(agent_interaction.id)
print(f"Status: {status_check.status}")
if status_check.status == "completed":
print("\n--- Final Report ---\n")
print(status_check.outputs[-1].text)
break
elif status_check.status in ["failed", "cancelled"]:
print("Agent failed.")
break
time.sleep(10)For more details, see Deep Research.
Next Steps
The Interactions API supports many more complex workflows. Check the API Reference for details on:
- Structured Outputs: Force the model to return valid JSON matching a specific schema.
- Streaming: Stream token responses for real-time applications.
- Thinking Models: Configure
thinking_levelfor Gemini 2.5 and 3.0 models to handle complex reasoning. - Remote MCP: Connect Gemini to your own private MCP servers.
- Combining tools and structured outputs: Combine tools and structured outputs to create more complex workflows.
- File Uploads: Upload files to the model for processing.
- Data Model: high level overview of the main inputs and outputs of the API.
Feedback
This API is in Beta, and we want your feedback! We're actively listening to developers to shape the future of this API. What features would help your agent workflows? What pain points are you experiencing? Please let me know on Twitter or LinkedIn.