Getting started with the Gemini Interactions API

June 23, 20267 minute read

The Interactions API is Google's primary interface for Gemini models and agents. A single endpoint covers text generation, streaming, multi-turn chat, multimodal inputs, image generation, structured output, tool use, function calling, managed agents, and background execution.

This guide uses JavaScript. For Python and REST examples, see the Interactions API quickstart.

Using a coding agent? Install the skill so your agent stays current with Interactions API patterns:

Shell

npx skills add google-gemini/gemini-skills --skill gemini-interactions-api

Setup

Create a free API key at Google AI Studio, then set it as an environment variable:

Shell

export GEMINI_API_KEY="YOUR_API_KEY"

Install the SDK:

Shell

npm install @google/genai

Send your first request.

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const interaction = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: "Explain how AI works in a few words",
});
console.log(interaction.output_text);

interaction.output_text gives you the final text directly. See the text generation guide for system instructions and generation config.

Streaming

Add stream: true and iterate over events. Each step.delta with type === "text" is a chunk you can display immediately.

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const stream = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: "Explain how AI works",
  stream: true,
});
 
for await (const event of stream) {
  if (event.event_type === "step.delta") {
    if (event.delta.type === "text") {
      process.stdout.write(event.delta.text);
    }
  }
}

See the streaming guide for event types and delta handling.

Multi-turn conversations

Chain interactions by passing previous_interaction_id. The server manages history for you.

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const interaction1 = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: "I have 2 dogs in my house.",
});
console.log("Response 1:", interaction1.output_text);
 
const interaction2 = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: "How many paws are in my house?",
  previous_interaction_id: interaction1.id,
});
console.log("Response 2:", interaction2.output_text);

For client-side history management, set store: false. See the multi-turn guide.

Multimodal understanding

Gemini natively understands images, audio, video, and documents. Upload a file and pass it alongside text.

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const uploadedFile = await ai.files.upload({ file: "photo.jpg" });
 
const interaction = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: [
    { type: "text", text: "What is in this image?" },
    {
      type: "image",
      uri: uploadedFile.uri,
      mime_type: uploadedFile.mimeType,
    },
  ],
});
console.log(interaction.output_text);

Audio, video, and documents use the same structure. See the guides for audio, video, and document processing.

Image generation

Generate images with Nano Banana 2 using the gemini-3.1-flash-image model.

JavaScript

import { GoogleGenAI } from "@google/genai";
import fs from "node:fs";
 
const ai = new GoogleGenAI({});
 
const interaction = await ai.interactions.create({
  model: "gemini-3.1-flash-image",
  input: "Generate an image of a futuristic city skyline at sunset",
});
 
fs.writeFileSync(
  "generated_image.png",
  Buffer.from(interaction.output_image.data, "base64")
);

Speech generation (multi-speaker TTS) and music generation (Lyria 3) work the same way. See the image generation guide for editing, aspect ratios, and style references.

Structured output

Get JSON that matches a schema you define. Works with Zod.

JavaScript

import { GoogleGenAI } from "@google/genai";
import * as z from "zod";
 
const ai = new GoogleGenAI({});
 
const recipeJsonSchema = {
  type: "object",
  properties: {
    recipe_name: { type: "string", description: "Name of the recipe." },
    ingredients: {
      type: "array", items: { type: "string" }, description: "List of ingredients."
    },
    prep_time_minutes: { type: "integer", description: "Prep time in minutes." }
  },
  required: ["recipe_name", "ingredients"]
};
 
const recipeSchema = z.fromJSONSchema(recipeJsonSchema);
 
const interaction = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: "Give me a recipe for banana bread",
  response_format: {
    type: "text",
    mime_type: "application/json",
    schema: recipeJsonSchema
  },
});
 
const recipe = recipeSchema.parse(JSON.parse(interaction.output_text));
console.log(recipe);

See the structured output guide for recursive schemas and enums.

Tools (Google Search)

Ground responses in real-time data by passing tools: [{ type: "google_search" }].

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const interaction = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: "Who won the euro 2024?",
  tools: [{ type: "google_search" }],
});
console.log(interaction.output_text);

Other built-in tools: Code Execution, URL Context, File Search, Google Maps, Computer Use. Mix multiple tools in one request. See the tool combination guide.

Function calling

Declare functions, let the model decide when to call them, execute locally, and return results.

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const weatherTool = {
  type: "function",
  name: "get_current_temperature",
  description: "Gets the current temperature for a given location.",
  parameters: {
    type: "object",
    properties: {
      location: {
        type: "string",
        description: "The city name, e.g. San Francisco",
      },
    },
    required: ["location"],
  },
};
 
const availableFunctions = {
  get_current_temperature: ({ location }) => ({
    location, temperature: "22", unit: "celsius"
  }),
};
 
let input = "What is the temperature in London?";
let previousId = null;
let interaction;
 
while (true) {
  interaction = await ai.interactions.create({
    model: "gemini-3.5-flash",
    input,
    tools: [weatherTool],
    previous_interaction_id: previousId,
  });
 
  const functionResults = [];
  for (const step of interaction.steps) {
    if (step.type === "function_call") {
      const result = availableFunctions[step.name](step.arguments);
      console.log(`Called ${step.name}(${JSON.stringify(step.arguments)}) →`, result);
      functionResults.push({
        type: "function_result",
        name: step.name,
        call_id: step.id,
        result: [{ type: "text", text: JSON.stringify(result) }],
      });
    }
  }
 
  if (functionResults.length === 0) break;
 
  input = functionResults;
  previousId = interaction.id;
}
 
console.log(interaction.output_text);

The model returns status: "requires_action" with function_call steps. You execute locally and submit function_result steps back. See the function calling guide for parallel calls and function choice modes.

Managed agents

Run an agent in a remote sandbox with code execution, web browsing, and file management. Pass agent instead of model and set environment: "remote".

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const interaction = await ai.interactions.create({
  agent: "antigravity-preview-05-2026",
  input: "Write a script that generates the first 20 Fibonacci numbers and saves them to fibonacci.txt.",
  environment: "remote",
});
console.log(interaction.output_text);

Define custom agents with your own instructions, skills, and data sources. See the Managed Agents quickstart.

Background execution

Set background: true for long-running tasks. The call returns immediately and you poll for results.

JavaScript

import { GoogleGenAI } from "@google/genai";
 
const ai = new GoogleGenAI({});
 
const interaction = await ai.interactions.create({
  model: "gemini-3.5-flash",
  input: "Write a detailed analysis of AI in healthcare.",
  background: true,
});
console.log(`Task started: ${interaction.id} (status: ${interaction.status})`);
 
const poll = setInterval(async () => {
  const result = await ai.interactions.get(interaction.id);
  if (result.status === "completed") {
    console.log(result.output_text);
    clearInterval(poll);
  } else if (result.status === "failed") {
    console.error("Failed:", result.error);
    clearInterval(poll);
  }
}, 5000);

See the background execution guide.

Thanks for reading! If you have any questions, feel free to contact me on Twitter or LinkedIn.