DoublewordDoubleword

autobatcher

Drop-in replacement for AsyncOpenAI that transparently batches requests. This library is designed or use with the Doubleword Batch API. Support for OpenAI's batch API or other compatible APIs is best effort. If you experience any issues, please open an issue.

Why?

Batch LLM APIs offers 50% cost savings (and specialist inference providers like Doubleword offer 80%+ savings), but these APIs you to restructure your code around file uploads and polling. autobatcher lets you keep your existing async code while getting batch pricing automatically.

# Before: regular async calls (full price)
from openai import AsyncOpenAI
client = AsyncOpenAI()

# After: batched calls (50% off)
from autobatcher import BatchOpenAI
client = BatchOpenAI()

# Same interface, same code
response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

How it works

  1. Requests are collected over a configurable time window (default: 1 second)
  2. When the window closes or batch size is reached, requests are submitted as a batch
  3. Results are polled and returned to waiting callers as they complete
  4. Your code sees normal ChatCompletion responses

Installation

pip install autobatcher

Usage

import asyncio
from autobatcher import BatchOpenAI

async def main():
    client = BatchOpenAI(
        api_key="sk-...",  # or set OPENAI_API_KEY env var
        batch_size=100,              # submit batch when this many requests queued
        batch_window_seconds=1.0,    # or after this many seconds
        poll_interval_seconds=5.0,   # how often to check for results
    )

    # Use exactly like AsyncOpenAI
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is 2+2?"}],
    )
    print(response.choices[0].message.content)

    await client.close()

asyncio.run(main())

Parallel requests

The real power comes when you have many requests:

async def process_many(prompts: list[str]) -> list[str]:
    client = BatchOpenAI(batch_size=50, batch_window_seconds=2.0)

    async def get_response(prompt: str) -> str:
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
        )
        return response.choices[0].message.content

    # All requests are batched together automatically
    results = await asyncio.gather(*[get_response(p) for p in prompts])

    await client.close()
    return results

Context manager

async with BatchOpenAI() as client:
    response = await client.chat.completions.create(...)

Configuration

ParameterDefaultDescription
api_keyNoneOpenAI API key (falls back to OPENAI_API_KEY env var)
base_urlNoneAPI base URL (for proxies or compatible APIs)
batch_size100Submit batch when this many requests are queued
batch_window_seconds1.0Submit batch after this many seconds
poll_interval_seconds5.0How often to poll for batch completion
completion_window"24h"Batch completion window ("24h" or "1h")

Limitations

  • Only chat.completions.create is supported for now
  • Batch API has a 24-hour completion window by default. 1hr SLAs is also offered with Doubleword.
  • No escalations when the completion window elapses
  • Not suitable for real-time/interactive use cases

License

MIT