DoublewordDoubleword

Intro to Doubleword Inference

Doubleword provides three styles of inference, each optimized for different workloads. Async and batch inference offer significant cost savings over real-time pricing by deferring processing from synchronous to asynchronous execution.

All three styles use the same OpenAI-compatible API format and share the same model catalog.

RealtimeAsyncBatch
How it worksStandard request-responseAutobatcher defers calls to async processingUpload JSONL file, retrieve results later
LatencyImmediateMinutesHours
SLANone1 hour1 hour or 24 hours
CostStandard pricingReduced pricingLowest pricing (24h SLA)
API changeNone — drop-in OpenAI replacementSwap SDK import onlyJSONL file preparation
Best forInteractive chat, prototyping, prompt iterationAgentic workflows, background pipelines, production workloadsDataset processing, evaluations, bulk generation

Realtime Inference

Realtime inference works exactly like the standard OpenAI API — send a request, get an immediate response. It's ideal for interactive use cases, development, and prototyping.

No cost savings, but no latency trade-off either.

Get started with Realtime Inference →


Async Inference

Async inference uses the Autobatcher to automatically convert your API calls into high-priority asynchronous requests. It's a drop-in replacement for the OpenAI SDK — your existing code works with a single import change.

Because requests are deferred from real-time to async processing, you get significant cost savings while keeping the same familiar API interface.

Best suited for:

  • Multi-step agentic workflows where each call doesn't need an instant response
  • Background content generation and classification pipelines
  • Any application code that can tolerate short async delays
  • Teams migrating from OpenAI who want immediate cost savings with zero refactoring

Get started with Async Inference →


Batch Inference

Batch inference is designed for large-scale data processing workloads that run outside of your application code. You upload requests as JSONL files and retrieve results when processing is complete.

With a 24-hour SLA, batch inference offers the deepest cost savings — ideal for workloads where turnaround time is measured in hours, not seconds.

Best suited for:

  • Large dataset processing and transformation
  • Model evaluations and benchmarking
  • Bulk content generation and classification
  • Research workflows and data enrichment

Get started with Batch Inference →