Prompt Cache
Overview
Stima API includes a powerful Prompt Cache feature that significantly improves system performance, reduces processing time for repeated requests, and lowers API costs. When the same prompt is requested multiple times, the system returns results directly from the cache without calling the upstream AI model again.
Key Features
Core Benefits
- Performance Boost: Cache hits reduce response time from seconds to milliseconds
- Cost Savings: Reduces API calls to upstream AI providers, lowering usage costs
- Exact Matching: Supports exact-match cache responses
- Automatic Management: Built-in TTL (Time To Live) mechanism automatically cleans expired cache
Technical Architecture
- Cache Strategy: LRU (Least Recently Used) eviction policy
- Concurrency Safe: Supports cache operations in high-concurrency environments
- Fault Tolerance: Automatically falls back to normal request flow when cache fails
Cache Strategy
Cache Key Generation
The system generates unique cache keys using:
- Model name
- Prompt content
- System message
- Temperature parameter
- Other relevant parameters
Cache Hit Conditions
A cache hit requires:
- Exact same prompt content
- Same model and parameter settings
- Cache item not expired
- Cache size within limits
Cache Invalidation
Cache invalidates when:
- Exceeds TTL setting
- Redis storage space insufficient
- Manual cache clearing
- System restart (if not persisted)