Configuration ============= Client limits ------------- - **max_requests_per_minute**: RPM limiter for concurrency and scheduling. - **max_tokens_per_minute**: TPM limiter for budgeted token throughput. - **max_workers**: Upper bound for thread pool size (defaults to ``min(RPM, CPU*20)``). Environment variables --------------------- - **MAX_CONTEXT_TOKENS_FALLBACK**: Fallback maximum context window tokens used by the sanitizer to truncate long message histories when LiteLLM info isn't available (default: 100000). - **MAX_INPUT_TOKENS_OVERRIDE**: When set to an integer, this value is used as the model's maximum input tokens, taking precedence over the value reported by LiteLLM. Provider credentials -------------------- Set the credentials expected by your LiteLLM provider. For OpenAI-compatible APIs, set ``OPENAI_API_KEY``. Logging ------- The library uses the standard ``logging`` module. Configure at application startup: .. code-block:: python import logging logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s") Autodoc and docs ---------------- The Sphinx configuration mocks heavy dependencies (``openai``, ``pyrate_limiter``, ``numpy``) to avoid import issues during doc builds.