Usage Examples ============== Basic concurrent requests ------------------------- .. code-block:: python from llm_api_client import APIClient client = APIClient(max_requests_per_minute=200, max_tokens_per_minute=200000) prompts = [ "Summarize the plot of The Matrix in one paragraph.", "List three benefits of unit testing.", "Translate 'good morning' to Spanish.", ] requests_data = [ { "model": "gpt-4o-mini", "messages": [{"role": "user", "content": prompt}], "temperature": 0.4, } for prompt in prompts ] responses = client.make_requests(requests_data) for r in responses: print(r.choices[0].message.content) Retries with backoff -------------------- Use :py:meth:`llm_api_client.api_client.APIClient.make_requests_with_retries` to automatically retry failed calls: .. code-block:: python from llm_api_client import APIClient client = APIClient(max_requests_per_minute=100) requests_data = [ {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello"}]} for _ in range(10) ] responses = client.make_requests_with_retries( requests_data, max_retries=2, sanitize=True, timeout=60, ) print(client.tracker) # usage stats Streaming and large context handling ------------------------------------ Requests are sanitized to fit model/provider constraints when ``sanitize=True``. .. code-block:: python from llm_api_client import APIClient client = APIClient() long_history = [ {"role": "user", "content": "... a very long conversation ..."}, # many messages ] response = client.make_requests([ {"model": "gpt-4o-mini", "messages": long_history} ])[0] Accessing history and usage --------------------------- .. code-block:: python print(client.history) # list of request/response entries print(client.tracker.details) # dict with costs/tokens/latency stats