Configuration

Client limits

max_requests_per_minute: RPM limiter for concurrency and scheduling.
max_tokens_per_minute: TPM limiter for budgeted token throughput.
max_workers: Upper bound for thread pool size (defaults to min(RPM, CPU*20)).

Environment variables

MAX_CONTEXT_TOKENS_FALLBACK: Fallback maximum context window tokens used by the sanitizer to truncate long message histories when LiteLLM info isn’t available (default: 100000).
MAX_INPUT_TOKENS_OVERRIDE: When set to an integer, this value is used as the model’s maximum input tokens, taking precedence over the value reported by LiteLLM.

Provider credentials

Set the credentials expected by your LiteLLM provider. For OpenAI-compatible APIs, set OPENAI_API_KEY.

Logging

The library uses the standard logging module. Configure at application startup:

import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")

Autodoc and docs

The Sphinx configuration mocks heavy dependencies (openai, pyrate_limiter, numpy) to avoid import issues during doc builds.