Configuration
=============

Client limits
-------------

- **max_requests_per_minute**: RPM limiter for concurrency and scheduling.
- **max_tokens_per_minute**: TPM limiter for budgeted token throughput.
- **max_workers**: Upper bound for thread pool size (defaults to ``min(RPM, CPU*20)``).

Environment variables
---------------------

- **MAX_CONTEXT_TOKENS_FALLBACK**: Fallback maximum context window tokens used by the sanitizer
  to truncate long message histories when LiteLLM info isn't available (default: 100000).
- **MAX_INPUT_TOKENS_OVERRIDE**: When set to an integer, this value is used as the
  model's maximum input tokens, taking precedence over the value reported by LiteLLM.

Provider credentials
--------------------

Set the credentials expected by your LiteLLM provider. For OpenAI-compatible APIs,
set ``OPENAI_API_KEY``.

Logging
-------

The library uses the standard ``logging`` module. Configure at application startup:

.. code-block:: python

   import logging
   logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")

Autodoc and docs
----------------

The Sphinx configuration mocks heavy dependencies (``openai``, ``pyrate_limiter``, ``numpy``)
to avoid import issues during doc builds.