autobatcher | Batch API | Doubleword Docs

Drop-in replacement for AsyncOpenAI that transparently batches requests. This library is designed or use with the Doubleword Batch API. Support for OpenAI's batch API or other compatible APIs is best effort. If you experience any issues, please open an issue.

Why?

Batch LLM APIs offers 50% cost savings (and specialist inference providers like Doubleword offer 80%+ savings), but these APIs you to restructure your code around file uploads and polling. autobatcher lets you keep your existing async code while getting batch pricing automatically.

# Before: regular async calls (full price)
from openai import AsyncOpenAI
client = AsyncOpenAI()

# After: batched calls (50% off)
from autobatcher import BatchOpenAI
client = BatchOpenAI()

# Same interface, same code
response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

How it works

Requests are collected over a configurable time window (default: 1 second)
When the window closes or batch size is reached, requests are submitted as a batch
Results are polled and returned to waiting callers as they complete
Your code sees normal ChatCompletion responses

Installation

pip install autobatcher

Usage

import asyncio
from autobatcher import BatchOpenAI

async def main():
    client = BatchOpenAI(
        api_key="sk-...",  # or set OPENAI_API_KEY env var
        batch_size=100,              # submit batch when this many requests queued
        batch_window_seconds=1.0,    # or after this many seconds
        poll_interval_seconds=5.0,   # how often to check for results
    )

    # Use exactly like AsyncOpenAI
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is 2+2?"}],
    )
    print(response.choices[0].message.content)

    await client.close()

asyncio.run(main())

Parallel requests

The real power comes when you have many requests:

async def process_many(prompts: list[str]) -> list[str]:
    client = BatchOpenAI(batch_size=50, batch_window_seconds=2.0)

    async def get_response(prompt: str) -> str:
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
        )
        return response.choices[0].message.content

    # All requests are batched together automatically
    results = await asyncio.gather(*[get_response(p) for p in prompts])

    await client.close()
    return results

Context manager

async with BatchOpenAI() as client:
    response = await client.chat.completions.create(...)

Configuration

Parameter	Default	Description
`api_key`	`None`	OpenAI API key (falls back to `OPENAI_API_KEY` env var)
`base_url`	`None`	API base URL (for proxies or compatible APIs)
`batch_size`	`100`	Submit batch when this many requests are queued
`batch_window_seconds`	`1.0`	Submit batch after this many seconds
`poll_interval_seconds`	`5.0`	How often to poll for batch completion
`completion_window`	`"24h"`	Batch completion window (`"24h"` or `"1h"`)

Limitations

Only chat.completions.create is supported for now
Batch API has a 24-hour completion window by default. 1hr SLAs is also offered with Doubleword.
No escalations when the completion window elapses
Not suitable for real-time/interactive use cases

License

MIT