OpenAI Client Caching¶

Overview¶

Langroid implements client caching for OpenAI and compatible APIs (Groq, Cerebras, etc.) to improve performance and prevent resource exhaustion issues.

Configuration¶

Option¶

Set use_cached_client in your OpenAIGPTConfig:

from langroid.language_models import OpenAIGPTConfig

config = OpenAIGPTConfig(
    chat_model="gpt-4",
    use_cached_client=True  # Default
)

Default Behavior¶

use_cached_client=True (enabled by default)
Clients with identical configurations share the same underlying HTTP connection pool
Different configurations (API key, base URL, headers, etc.) get separate client instances

Benefits¶

Connection Pooling: Reuses TCP connections, reducing latency and overhead
Resource Efficiency: Prevents "too many open files" errors when creating many agents
Performance: Eliminates connection handshake overhead on subsequent requests
Thread Safety: Shared clients are safe to use across threads

When to Disable Client Caching¶

Set use_cached_client=False in these scenarios:

Multiprocessing: Each process should have its own client instance
Client Isolation: When you need complete isolation between different agent instances
Debugging: To rule out client sharing as a source of issues
Legacy Compatibility: If your existing code depends on unique client instances

Example: Disabling Client Caching¶

config = OpenAIGPTConfig(
    chat_model="gpt-4",
    use_cached_client=False  # Each instance gets its own client
)

Technical Details¶

Uses SHA256-based cache keys to identify unique configurations
Implements singleton pattern with lazy initialization
Automatically cleans up clients on program exit via atexit hooks
Compatible with both sync and async OpenAI clients