Configuration
Client limits
max_requests_per_minute: RPM limiter for concurrency and scheduling.
max_tokens_per_minute: TPM limiter for budgeted token throughput.
max_workers: Upper bound for thread pool size (defaults to
min(RPM, CPU*20)).
Environment variables
MAX_CONTEXT_TOKENS_FALLBACK: Fallback maximum context window tokens used by the sanitizer to truncate long message histories when LiteLLM info isn’t available (default: 100000).
MAX_INPUT_TOKENS_OVERRIDE: When set to an integer, this value is used as the model’s maximum input tokens, taking precedence over the value reported by LiteLLM.
Provider credentials
Set the credentials expected by your LiteLLM provider. For OpenAI-compatible APIs,
set OPENAI_API_KEY.
Logging
The library uses the standard logging module. Configure at application startup:
import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")
Autodoc and docs
The Sphinx configuration mocks heavy dependencies (openai, pyrate_limiter, numpy)
to avoid import issues during doc builds.