Configuration

Client limits

  • max_requests_per_minute: RPM limiter for concurrency and scheduling.

  • max_tokens_per_minute: TPM limiter for budgeted token throughput.

  • max_workers: Upper bound for thread pool size (defaults to min(RPM, CPU*20)).

Environment variables

  • MAX_CONTEXT_TOKENS_FALLBACK: Fallback maximum context window tokens used by the sanitizer to truncate long message histories when LiteLLM info isn’t available (default: 100000).

  • MAX_INPUT_TOKENS_OVERRIDE: When set to an integer, this value is used as the model’s maximum input tokens, taking precedence over the value reported by LiteLLM.

Provider credentials

Set the credentials expected by your LiteLLM provider. For OpenAI-compatible APIs, set OPENAI_API_KEY.

Logging

The library uses the standard logging module. Configure at application startup:

import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")

Autodoc and docs

The Sphinx configuration mocks heavy dependencies (openai, pyrate_limiter, numpy) to avoid import issues during doc builds.