Responses API

POST /v1/responses

The Responses API is compatible with OpenAI's Responses API format, providing a streamlined interface for generating AI responses. This endpoint accepts the input field (instead of messages) and automatically converts it to the internal format.

Input Format

The Responses API uses input instead of messages. You can provide either a string for simple queries or an array of message objects for conversations.

HTTP Request

curl https://api.apertis.ai/v1/responses \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <APERTIS_API_KEY>" \
    -d '{
        "model": "gpt-4.1",
        "input": "What is the capital of France?"
    }'

Authentication

Header	Format	Example
`Authorization`	Bearer token	`Authorization: Bearer sk-your-api-key`

Parameters

Required Parameters

Parameter	Type	Description
`model`	string	The model to use for generating the response
`input`	string/array	The input text or array of message objects

Optional Parameters

Parameter	Type	Description
`instructions`	string	High-level instructions for model behavior
`stream`	boolean	Enable streaming responses. Default: false
`temperature`	number	Sampling temperature (0-2). Default: 1
`max_tokens`	integer	Maximum tokens in the response
`max_output_tokens`	integer	Upper bound for tokens including reasoning tokens
`tools`	array	List of tools the model can use (including web search)
`tool_choice`	string/object	Controls tool selection behavior
`max_tool_calls`	integer	Maximum number of tool calls allowed
`parallel_tool_calls`	boolean	Whether to allow parallel tool execution
`reasoning`	object	Reasoning configuration (see below)
`text`	object	Text output configuration (see below)
`store`	boolean	Whether to store the response for state management
`previous_response_id`	string	ID of previous response for multi-turn conversations
`truncation`	string	Context overflow handling (e.g., `"auto"`)
`metadata`	object	Request metadata (up to 16 key-value pairs)

Reasoning Parameter

The reasoning parameter configures the model's reasoning behavior:

Option	Type	Description
`effort`	string	Reasoning effort level: `low`, `medium`, `high`
`summary`	string	Summary style: `auto`, `concise`, `detailed`

response = client.responses.create(
    model="o1-preview",
    input="Solve this complex math problem...",
    reasoning={
        "effort": "high",
        "summary": "detailed"
    }
)

Text Parameter

The text parameter configures text output:

Option	Type	Description
`format`	object	Output format configuration (e.g., `{"type": "json_object"}`)
`verbosity`	string	Output verbosity: `concise`, `normal`, `verbose`

Input Formats

String Input (simple query):

{
  "model": "gpt-4.1",
  "input": "What is the capital of France?"
}

Array Input (conversation):

{
  "model": "gpt-4.1",
  "input": [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"},
    {"role": "user", "content": "What's 2+2?"}
  ]
}

Example Usage

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.apertis.ai/v1"
)

response = client.responses.create(
    model="gpt-4.1",
    input="Explain quantum computing in simple terms."
)

print(response.output_text)

JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.apertis.ai/v1'
});

const response = await client.responses.create({
  model: 'gpt-4.1',
  input: 'Explain quantum computing in simple terms.'
});

console.log(response.output_text);

With Web Search

Use the tools parameter to enable web search:

response = client.responses.create(
    model="gpt-4.1",
    input="What are the latest news about AI?",
    tools=[{"type": "web_search_preview"}]
)

With Reasoning

response = client.responses.create(
    model="o1-preview",
    input="Prove the Pythagorean theorem step by step.",
    reasoning={
        "effort": "high",
        "summary": "detailed"
    }
)

Streaming

stream = client.responses.create(
    model="gpt-4.1",
    input="Write a short story about a robot.",
    stream=True
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Multi-turn Conversation

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {"role": "user", "content": "What is Python?"},
        {"role": "assistant", "content": "Python is a high-level programming language..."},
        {"role": "user", "content": "How do I install it?"}
    ]
)

With Instructions

Use instructions to provide high-level guidance for model behavior:

response = client.responses.create(
    model="gpt-4.1",
    instructions="You are a helpful coding assistant. Always provide code examples.",
    input="How do I read a file in Python?"
)

Stateful Conversations

Use store and previous_response_id for multi-turn conversations with state:

# First request - store the response
response1 = client.responses.create(
    model="gpt-4.1",
    input="My name is Alice.",
    store=True
)

# Follow-up request - reference the previous response
response2 = client.responses.create(
    model="gpt-4.1",
    input="What's my name?",
    previous_response_id=response1.id
)

With Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.responses.create(
    model="gpt-4.1",
    input="What's the weather in Tokyo?",
    tools=tools,
    tool_choice="auto"
)

Response Format

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1699000000,
  "model": "gpt-4.1",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 8,
    "total_tokens": 20
  }
}

Supported Models

All chat models available through Apertis are supported with the Responses API. The API intelligently routes requests based on model capabilities:

Models with Native /v1/responses Support

These models natively support the Responses API format on upstream providers:

Model Series	Examples
o1 Series	`o1`, `o1-preview`, `o4-mini`, `o1-2024-12-17`
o3 Series	`o3`, `o3-mini`, `o3-2025-04-16`
o4 Series	`o4-mini`, `o4-mini-2025-04-16`
GPT-5 Series	`gpt-5`, `gpt-5.1`, `gpt-5.2`, `gpt-5-*`
Codex Models	`gpt-5-codex`, `gpt-5-codex-*`

Responses-Only Models

Some advanced models only support the /v1/responses endpoint and cannot be used with /v1/chat/completions or /v1/messages:

Model	Description
`gpt-5-pro`	GPT-5 Pro variant
`gpt-5-chat-latest`	Latest GPT-5 chat model
`gpt-5-mini`	GPT-5 Mini
`gpt-5-nano`	GPT-5 Nano
`gpt-5-codex-*`	GPT-5 Codex variants
`o1-pro`	O1 Pro
`codex-mini`	Codex Mini

Responses-Only Models

When using responses-only models, you must use the /v1/responses endpoint. Requests to /v1/chat/completions or /v1/messages will return an error for these models.

All Other Models

For models that don't natively support /v1/responses (like Claude, Gemini, GPT-4o, etc.), Apertis automatically:

Routes the request via /v1/chat/completions internally
Converts the response back to Responses API format
Returns the response in the expected format

This means you can use any model with the Responses API - the conversion is handled transparently.

Provider	Example Models	Native Support
OpenAI	`gpt-4.1`, `gpt-4.1-mini`	Via fallback
Anthropic	`claude-sonnet-4.5`, `claude-opus-4`	Via fallback
Google	`gemini-3-pro-preview`, `gemini-3-flash-preview`	Via fallback
Meta	`llama-3.1-70b`, `llama-3.1-8b`	Via fallback
xAI	`grok-3`, `grok-3-reasoning`	Via fallback

Differences from Chat Completions

Feature	Responses API	Chat Completions
Input field	`input`	`messages`
String input	Supported	Not supported
Instructions	`instructions` parameter	System message in `messages`
Reasoning config	`reasoning` object	`reasoning_effort` string
Text config	`text` object	Not available
State management	`store`, `previous_response_id`	Manual message array
Token limit	`max_output_tokens`	`max_tokens`
Tool call limits	`max_tool_calls`, `parallel_tool_calls`	Not available
Truncation	`truncation` parameter	Not available

Chat Completions - Traditional chat completion format
Messages API - Anthropic-compatible format
Models - List available models

HTTP Request​

Authentication​

Parameters​

Required Parameters​

Optional Parameters​

Reasoning Parameter​

Text Parameter​

Input Formats​

Example Usage​

Python​

JavaScript​

With Web Search​

With Reasoning​

Streaming​

Multi-turn Conversation​

With Instructions​

Stateful Conversations​

With Function Calling​

Response Format​

Supported Models​

Models with Native /v1/responses Support​

Responses-Only Models​

All Other Models​

Differences from Chat Completions​

Related Topics​

HTTP Request

Authentication

Parameters

Required Parameters

Optional Parameters

Reasoning Parameter

Text Parameter

Input Formats

Example Usage

Python

JavaScript

With Web Search

With Reasoning

Streaming

Multi-turn Conversation

With Instructions

Stateful Conversations

With Function Calling

Response Format

Supported Models

Models with Native /v1/responses Support

Responses-Only Models

All Other Models

Differences from Chat Completions

Related Topics