Gemini Interactions API Quick Start

January 22, 20266 minute readView Code

The Interactions API is a unified interface for building with Gemini models and agents. It simplifies the development of agentic applications by handling server-side state management, tool orchestration, and long-running tasks.

With a single endpoint, you can:

Interact with Gemini models for text, image, and audio generation
Build multi-turn conversations without managing history client-side
Call custom functions and built-in tools like Google Search
Run specialized agents like Deep Research for complex tasks

Prerequisites

Get a Gemini API key and install the Google GenAI SDK:

pip install google-genai

Set your API key:

export GEMINI_API_KEY="your-api-key"

Create an interaction

At its simplest, the Interactions API works like a standard chat completion. You provide a model and an input string. By default, interactions are stored (store=True), allowing you to reference them later.

from google import genai
 
client = genai.Client()
 
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    system_instruction="You are a helpful assistant.",
    input="Explain quantum entanglement in one sentence."
)
 
print(interaction.outputs[-1].text)
# Output: Quantum entanglement is a phenomenon where particles become linked...

Stateful Multi-turn Conversations

One of the most powerful features of this API is server-side state management. You do not need to append messages to a list and send the full history back to the server every time.

Use previous_interaction_id to continue a conversation.

Note: Only conversation history is preserved. Parameters like tools or generation_config are interaction-scoped and must be re-declared if needed.

from google import genai
 
client = genai.Client()
 
turn_1 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="My name is Alice and I am a software engineer."
)
print(f"Turn 1 ID: {turn_1.id}")
 
turn_2 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What is my job?",
    previous_interaction_id=turn_1.id
)
 
print(turn_2.outputs[-1].text)
# Output: You are a software engineer.

For client-managed history, see Stateless conversations.

Forking Conversations

Because state is managed by ID, you can "fork" a conversation by referencing an older interaction ID with a different prompt.

# Branch off from Turn 1 with a different topic
turn_2_fork = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What is my name?",
    previous_interaction_id=turn_1.id
)
 
print(turn_2_fork.outputs[-1].text)
# Output: Your name is Alice.

Multimodal Interactions

Gemini models natively understand and generate multiple content types. You can pass text, images, audio, or PDF documents in a single interaction. This example uses a remote image URL.

Multimodal understanding

from google import genai
 
client = genai.Client()
 
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Generate a recipe for the shown scones."},
        {
            "type": "image", 
            "uri": "https://storage.googleapis.com/generativeai-downloads/images/scones.jpg"
        }
    ],
)
 
print(interaction.outputs[-1].text)

For audio, video, and document (PDF) understanding, see Multimodal understanding.

Multimodal Generation

import base64
from google import genai
 
client = genai.Client()
 
interaction = client.interactions.create(
    model="gemini-3-pro-image-preview",
    input="Generate an image of a futuristic city at sunset."
)
 
for output in interaction.outputs:
    if output.type == "image":
        with open("city.png", "wb") as f:
            f.write(base64.b64decode(output.data))

For audio generation see Multimodal generations.

Tool use

Tools extend the model's capabilities by letting it call external functions or services. The API includes ready-to-use tools like Google Search, and lets you define custom tools as JSON schemas. The model decides when to call them based on the conversation:

Built-in tools

from google import genai
 
client = genai.Client()
 
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Who won the 2024 Nobel Prize in Physics?",
    tools=[{"type": "google_search"}]
)
 
text_output = next((o for o in interaction.outputs if o.type == "text"), None)
if text_output:
    print(text_output.text)

Other built-in tools include Code Execution and Computer Use.

Function calling

from google import genai
 
client = genai.Client()
 
# Define a tool
weather_tool = {
    "type": "function",
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
    }
}
 
# Send request with tool
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What's the weather in Tokyo?",
    tools=[weather_tool]
)
 
# Handle tool call
for output in interaction.outputs:
    if output.type == "function_call":
        # Execute your function (mocked here)
        # result = get_weather(output.arguments
        result = {"temperature": "22°C", "condition": "sunny"}
 
        # Return result to model
        interaction = client.interactions.create(
            model="gemini-3-flash-preview",
            previous_interaction_id=interaction.id,
            input={
                "type": "function_result",
                "name": output.name,
                "call_id": output.id,
                "result": result
            }
        )
        print(interaction.outputs[-1].text)

For code execution, URL context, and MCP servers, see Agentic capabilities.

Agents & Long-Running Tasks

Beyond models, the Interactions API provides access to specialized agents. Deep Research executes multi-step research tasks, synthesizing information from multiple sources into comprehensive reports.

Agents run asynchronously with background=True. Poll the interaction status to retrieve results:

import time
from google import genai
 
client = genai.Client()
 
agent_interaction = client.interactions.create(
    agent="deep-research-pro-preview-12-2025", # Note: use 'agent', not 'model'
    input="Research the history of the Google TPUs with a focus on 2025 specs.",
    background=True
)
 
 
# Poll for completion
while True:
    status_check = client.interactions.get(agent_interaction.id)
    print(f"Status: {status_check.status}")
    
    if status_check.status == "completed":
        print("\n--- Final Report ---\n")
        print(status_check.outputs[-1].text)
        break
    elif status_check.status in ["failed", "cancelled"]:
        print("Agent failed.")
        break
        
    time.sleep(10)

For more details, see Deep Research.

Next Steps

The Interactions API supports many more complex workflows. Check the API Reference for details on:

Structured Outputs: Force the model to return valid JSON matching a specific schema.
Streaming: Stream token responses for real-time applications.
Thinking Models: Configure thinking_level for Gemini 2.5 and 3.0 models to handle complex reasoning.
Remote MCP: Connect Gemini to your own private MCP servers.
Combining tools and structured outputs: Combine tools and structured outputs to create more complex workflows.
File Uploads: Upload files to the model for processing.
Data Model: high level overview of the main inputs and outputs of the API.

Feedback

This API is in Beta, and we want your feedback! We're actively listening to developers to shape the future of this API. What features would help your agent workflows? What pain points are you experiencing? Please let me know on Twitter or LinkedIn.