Usage Examples
Basic concurrent requests
from llm_api_client import APIClient
client = APIClient(max_requests_per_minute=200, max_tokens_per_minute=200000)
prompts = [
"Summarize the plot of The Matrix in one paragraph.",
"List three benefits of unit testing.",
"Translate 'good morning' to Spanish.",
]
requests_data = [
{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.4,
}
for prompt in prompts
]
responses = client.make_requests(requests_data)
for r in responses:
print(r.choices[0].message.content)
Retries with backoff
Use llm_api_client.api_client.APIClient.make_requests_with_retries() to automatically retry failed calls:
from llm_api_client import APIClient
client = APIClient(max_requests_per_minute=100)
requests_data = [
{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello"}]}
for _ in range(10)
]
responses = client.make_requests_with_retries(
requests_data,
max_retries=2,
sanitize=True,
timeout=60,
)
print(client.tracker) # usage stats
Streaming and large context handling
Requests are sanitized to fit model/provider constraints when sanitize=True.
from llm_api_client import APIClient
client = APIClient()
long_history = [
{"role": "user", "content": "... a very long conversation ..."},
# many messages
]
response = client.make_requests([
{"model": "gpt-4o-mini", "messages": long_history}
])[0]
Accessing history and usage
print(client.history) # list of request/response entries
print(client.tracker.details) # dict with costs/tokens/latency stats