Usage Examples

Basic concurrent requests

from llm_api_client import APIClient

client = APIClient(max_requests_per_minute=200, max_tokens_per_minute=200000)

prompts = [
    "Summarize the plot of The Matrix in one paragraph.",
    "List three benefits of unit testing.",
    "Translate 'good morning' to Spanish.",
]

requests_data = [
    {
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.4,
    }
    for prompt in prompts
]

responses = client.make_requests(requests_data)

for r in responses:
    print(r.choices[0].message.content)

Retries with backoff

Use llm_api_client.api_client.APIClient.make_requests_with_retries() to automatically retry failed calls:

from llm_api_client import APIClient

client = APIClient(max_requests_per_minute=100)

requests_data = [
    {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello"}]}
    for _ in range(10)
]

responses = client.make_requests_with_retries(
    requests_data,
    max_retries=2,
    sanitize=True,
    timeout=60,
)

print(client.tracker)  # usage stats

Streaming and large context handling

Requests are sanitized to fit model/provider constraints when sanitize=True.

from llm_api_client import APIClient

client = APIClient()

long_history = [
    {"role": "user", "content": "... a very long conversation ..."},
    # many messages
]

response = client.make_requests([
    {"model": "gpt-4o-mini", "messages": long_history}
])[0]

Accessing history and usage

print(client.history)            # list of request/response entries
print(client.tracker.details)    # dict with costs/tokens/latency stats