Skip to main content

Fallback Models

Stima API provides a powerful fallback model mechanism that automatically switches to backup models when the primary model request fails, ensuring high availability and service stability.

Overview

The fallback model mechanism automatically handles the following scenarios:

  • Model service timeouts or unresponsiveness
  • Network connectivity issues
  • Specified model temporarily unavailable
  • Service provider temporary outages

When the primary model fails, the system will attempt backup models in your specified priority order until one succeeds.

Configuration Methods

Request-Level Configuration (Highest Priority)

You can specify fallback model configuration directly in each API request:

{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
"fallback_timeout": 25000,
"fallback_enabled": true
}

Token-Level Configuration

You can also pre-configure fallback models in your API Token settings, which will automatically apply to all requests using that token.

Parameter Reference

fallback_models

  • Type: Array of strings
  • Description: List of fallback models in priority order
  • Limit: Maximum 5 fallback models
  • Example: ["gpt-3.5-turbo", "claude-3-haiku-20240307", "gemini-pro"]

fallback_timeout

  • Type: Integer
  • Unit: Milliseconds (ms)
  • Range: 5,000 - 300,000 milliseconds (5 - 300 seconds)
  • Default: 30,000 milliseconds (30 seconds)
  • Description: Wait time before switching to the next fallback model

fallback_enabled

  • Type: Boolean
  • Default: false
  • Description: Whether to enable the fallback model mechanism

Fallback Trigger Conditions

The system automatically triggers fallback models in the following situations:

1. Timeout Errors

  • Request exceeds the configured fallback_timeout duration
  • Context deadline exceeded

2. Connection Errors

  • Connection refused
  • Connection reset
  • Network unreachable
  • DNS resolution failures

3. Model Errors

  • Model not found
  • Invalid model
  • Model temporarily unavailable

4. HTTP Status Errors

  • Non-200 HTTP status codes

Request Examples

Python Examples

import openai

# Configure Stima API
client = openai.OpenAI(
api_key="your-stima-api-key",
base_url="https://api.stima.tech/v1"
)

# Chat request with fallback models
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
# Fallback configuration
extra_body={
"fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
"fallback_timeout": 25000, # 25 second timeout
"fallback_enabled": True
}
)

print(response.choices[0].message.content)

# Check if fallback model was used
if hasattr(response, 'headers'):
if response.headers.get('X-Fallback-Used') == 'true':
print(f"Fallback model used: {response.headers.get('X-Actual-Model')}")
print(f"Original model: {response.headers.get('X-Fallback-From')}")
print(f"Fallback reason: {response.headers.get('X-Fallback-Reason')}")

Complete Example with requests Library

import requests
import json

url = "https://api.stima.tech/v1/chat/completions"
headers = {
"Authorization": "Bearer your-stima-api-key",
"Content-Type": "application/json"
}

payload = {
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Write a Python function for me"}
],
"fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
"fallback_timeout": 20000,
"fallback_enabled": True
}

try:
response = requests.post(url, headers=headers, json=payload, timeout=60)

# Check fallback model usage
fallback_used = response.headers.get('X-Fallback-Used', 'false')
if fallback_used == 'true':
print(f"Fallback model activated!")
print(f"Original model: {response.headers.get('X-Fallback-From')}")
print(f"Actual model used: {response.headers.get('X-Actual-Model')}")
print(f"Fallback reason: {response.headers.get('X-Fallback-Reason')}")

result = response.json()
print(result['choices'][0]['message']['content'])

except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")

cURL Examples

# Basic fallback request
curl -X POST "https://api.stima.tech/v1/chat/completions" \
-H "Authorization: Bearer your-stima-api-key" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
"fallback_timeout": 25000,
"fallback_enabled": true
}' \
-w "\nStatus: %{http_code}\nResponse time: %{time_total}s\n" \
-v

cURL Example with Header Inspection

# Display full headers to check fallback usage
curl -X POST "https://api.stima.tech/v1/chat/completions" \
-H "Authorization: Bearer your-stima-api-key" \
-H "Content-Type: application/json" \
-D headers.txt \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Test fallback mechanism"}
],
"fallback_models": ["gpt-3.5-turbo"],
"fallback_timeout": 15000,
"fallback_enabled": true
}'

# Check fallback-related headers
echo "Fallback usage information:"
grep -i "x-fallback" headers.txt

Response Headers

When using the fallback model mechanism, the API response includes the following custom headers:

Header NameDescriptionExample Value
X-Fallback-UsedWhether fallback model was usedtrue / false
X-Fallback-FromOriginal requested model namegpt-4
X-Actual-ModelActually used model namegpt-3.5-turbo
X-Fallback-ReasonReason for triggering fallbackprimary_model_failed

Configuration Priority

  1. Request Level - fallback_* parameters in API request (highest priority)
  2. Token Level - Fallback model configuration in API Token settings
  3. System Default - 30 second timeout, fallback disabled

Best Practices

1. Choose Appropriate Fallback Models

  • Select fast-responding models as backups
  • Consider cost factors, fallback from expensive to cheaper models
  • Ensure fallback models support the same features

2. Set Reasonable Timeout Values

  • General recommendation: 20-30 seconds
  • Complex tasks: 60-120 seconds
  • Real-time chat: 10-15 seconds

3. Fallback Model Count

  • Recommend 1-3 fallback models
  • Too many fallback models increase total response time
  • System limit: maximum 5 fallback models

Monitoring and Debugging

Check Fallback Usage

# Python example: Check response headers
def check_fallback_usage(response):
headers = response.headers if hasattr(response, 'headers') else {}

fallback_used = headers.get('X-Fallback-Used', 'false')
if fallback_used == 'true':
print("Fallback model information:")
print(f" Original model: {headers.get('X-Fallback-From')}")
print(f" Actual model: {headers.get('X-Actual-Model')}")
print(f" Fallback reason: {headers.get('X-Fallback-Reason')}")
else:
print("Using original model, no fallback triggered")

Common Issues

Q: Why wasn't the fallback model triggered? A: Check if fallback_enabled is set to true and fallback_models is properly configured.

Q: What happens when all fallback models fail? A: The system returns the error from the last attempted fallback model. Check model availability and network connectivity.

Q: How to track fallback model usage frequency? A: Monitor the X-Fallback-Used header in responses to track fallback usage statistics.

Billing Information

  • Each model is billed at its standard rate
  • When using fallback models, billing is based on the actually used model
  • Use the X-Actual-Model header to confirm the billing model