Fallback Models

Stima API provides a powerful fallback model mechanism that automatically switches to backup models when the primary model request fails, ensuring high availability and service stability.

Overview

The fallback model mechanism automatically handles the following scenarios:

Model service timeouts or unresponsiveness
Network connectivity issues
Specified model temporarily unavailable
Service provider temporary outages

When the primary model fails, the system will attempt backup models in your specified priority order until one succeeds.

Configuration Methods

Request-Level Configuration (Highest Priority)

You can specify fallback model configuration directly in each API request:

{
  "model": "gpt-4",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ],
  "fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
  "fallback_timeout": 25000,
  "fallback_enabled": true
}

Token-Level Configuration

You can also pre-configure fallback models in your API Token settings, which will automatically apply to all requests using that token.

Parameter Reference

`fallback_models`

Type: Array of strings
Description: List of fallback models in priority order
Limit: Maximum 5 fallback models
Example: ["gpt-3.5-turbo", "claude-3-haiku-20240307", "gemini-pro"]

`fallback_timeout`

Type: Integer
Unit: Milliseconds (ms)
Range: 5,000 - 300,000 milliseconds (5 - 300 seconds)
Default: 30,000 milliseconds (30 seconds)
Description: Wait time before switching to the next fallback model

`fallback_enabled`

Type: Boolean
Default: false
Description: Whether to enable the fallback model mechanism

Fallback Trigger Conditions

The system automatically triggers fallback models in the following situations:

1. Timeout Errors

Request exceeds the configured fallback_timeout duration
Context deadline exceeded

2. Connection Errors

Connection refused
Connection reset
Network unreachable
DNS resolution failures

3. Model Errors

Model not found
Invalid model
Model temporarily unavailable

4. HTTP Status Errors

Non-200 HTTP status codes

Request Examples

Python Examples

import openai

# Configure Stima API
client = openai.OpenAI(
    api_key="your-stima-api-key",
    base_url="https://api.stima.tech/v1"
)

# Chat request with fallback models
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
    # Fallback configuration
    extra_body={
        "fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
        "fallback_timeout": 25000,  # 25 second timeout
        "fallback_enabled": True
    }
)

print(response.choices[0].message.content)

# Check if fallback model was used
if hasattr(response, 'headers'):
    if response.headers.get('X-Fallback-Used') == 'true':
        print(f"Fallback model used: {response.headers.get('X-Actual-Model')}")
        print(f"Original model: {response.headers.get('X-Fallback-From')}")
        print(f"Fallback reason: {response.headers.get('X-Fallback-Reason')}")

Complete Example with requests Library

import requests
import json

url = "https://api.stima.tech/v1/chat/completions"
headers = {
    "Authorization": "Bearer your-stima-api-key",
    "Content-Type": "application/json"
}

payload = {
    "model": "gpt-4",
    "messages": [
        {"role": "user", "content": "Write a Python function for me"}
    ],
    "fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
    "fallback_timeout": 20000,
    "fallback_enabled": True
}

try:
    response = requests.post(url, headers=headers, json=payload, timeout=60)
    
    # Check fallback model usage
    fallback_used = response.headers.get('X-Fallback-Used', 'false')
    if fallback_used == 'true':
        print(f"Fallback model activated!")
        print(f"Original model: {response.headers.get('X-Fallback-From')}")
        print(f"Actual model used: {response.headers.get('X-Actual-Model')}")
        print(f"Fallback reason: {response.headers.get('X-Fallback-Reason')}")
    
    result = response.json()
    print(result['choices'][0]['message']['content'])
    
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

cURL Examples

# Basic fallback request
curl -X POST "https://api.stima.tech/v1/chat/completions" \
  -H "Authorization: Bearer your-stima-api-key" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "fallback_models": ["gpt-3.5-turbo", "claude-3-haiku-20240307"],
    "fallback_timeout": 25000,
    "fallback_enabled": true
  }' \
  -w "\nStatus: %{http_code}\nResponse time: %{time_total}s\n" \
  -v

cURL Example with Header Inspection

# Display full headers to check fallback usage
curl -X POST "https://api.stima.tech/v1/chat/completions" \
  -H "Authorization: Bearer your-stima-api-key" \
  -H "Content-Type: application/json" \
  -D headers.txt \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Test fallback mechanism"}
    ],
    "fallback_models": ["gpt-3.5-turbo"],
    "fallback_timeout": 15000,
    "fallback_enabled": true
  }'

# Check fallback-related headers
echo "Fallback usage information:"
grep -i "x-fallback" headers.txt

Response Headers

When using the fallback model mechanism, the API response includes the following custom headers:

Header Name	Description	Example Value
`X-Fallback-Used`	Whether fallback model was used	`true` / `false`
`X-Fallback-From`	Original requested model name	`gpt-4`
`X-Actual-Model`	Actually used model name	`gpt-3.5-turbo`
`X-Fallback-Reason`	Reason for triggering fallback	`primary_model_failed`

Configuration Priority

Request Level - fallback_* parameters in API request (highest priority)
Token Level - Fallback model configuration in API Token settings
System Default - 30 second timeout, fallback disabled

Best Practices

1. Choose Appropriate Fallback Models

Select fast-responding models as backups
Consider cost factors, fallback from expensive to cheaper models
Ensure fallback models support the same features

2. Set Reasonable Timeout Values

General recommendation: 20-30 seconds
Complex tasks: 60-120 seconds
Real-time chat: 10-15 seconds

3. Fallback Model Count

Recommend 1-3 fallback models
Too many fallback models increase total response time
System limit: maximum 5 fallback models

Monitoring and Debugging

Check Fallback Usage

# Python example: Check response headers
def check_fallback_usage(response):
    headers = response.headers if hasattr(response, 'headers') else {}
    
    fallback_used = headers.get('X-Fallback-Used', 'false')
    if fallback_used == 'true':
        print("Fallback model information:")
        print(f"  Original model: {headers.get('X-Fallback-From')}")
        print(f"  Actual model: {headers.get('X-Actual-Model')}")
        print(f"  Fallback reason: {headers.get('X-Fallback-Reason')}")
    else:
        print("Using original model, no fallback triggered")

Common Issues

Q: Why wasn't the fallback model triggered? A: Check if fallback_enabled is set to true and fallback_models is properly configured.

Q: What happens when all fallback models fail? A: The system returns the error from the last attempted fallback model. Check model availability and network connectivity.

Q: How to track fallback model usage frequency? A: Monitor the X-Fallback-Used header in responses to track fallback usage statistics.

Billing Information

Each model is billed at its standard rate
When using fallback models, billing is based on the actually used model
Use the X-Actual-Model header to confirm the billing model

Overview​

Configuration Methods​

Request-Level Configuration (Highest Priority)​

Token-Level Configuration​

Parameter Reference​

fallback_models​

fallback_timeout​

fallback_enabled​

Fallback Trigger Conditions​

1. Timeout Errors​

2. Connection Errors​

3. Model Errors​

4. HTTP Status Errors​

Request Examples​

Python Examples​

Complete Example with requests Library​

cURL Examples​

cURL Example with Header Inspection​

Response Headers​

Configuration Priority​

Best Practices​

1. Choose Appropriate Fallback Models​

2. Set Reasonable Timeout Values​

3. Fallback Model Count​

Monitoring and Debugging​

Check Fallback Usage​

Common Issues​

Billing Information​