Doubleword Documentation

JSONL (JSON Lines) is a text format where each line is a valid, independent JSON object. We use the JSONL format for submitting LLM requests in bulk to the Doubleword Batch API.

Requests are submitted to the Doubleword API as JSONL files that look like the following:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}]}}

Creating a JSONL File for Batch Processing

This guide walks you through converting a real-time chat completion request into a JSONL file, ready for batch submission.

Starting with a Standard Chat Request

Standard chat requests follows the chat completions API format. The request body is valid JSON containing the model and messages.

For example, if we want to know the capital of Colorado, we submit an HTTP request with the following body:

{
  "model": "gpt-4",
  "messages": 
  [
    {"role": "user", "content": "What is the capital of Colorado?"}
  ]
}

Scaling to Multiple Requests

Let's say we want to ask about all 50 state capitals instead of just Colorado. Rather than making 50 separate API calls, we can submit them together as a single batch. Then, once the system has completed processing all of our requests, we can retrieve the results in one go.

The Doubleword Batch API offers an OpenAI-compatible batched endpoint. Like OpenAI's format, each line in your JSONL file must contain certain fields:

custom_id: Your unique identifier for tracking the request (string, max 64 characters)
method: HTTP method, always "POST"
url: API endpoint, typically "/v1/chat/completions". /v1/embeddings is also supported, for embedding models.
body: The actual API request parameters (model, messages, temperature, etc.). This is the same body as your real-time request.

Important: Each line must be a complete, valid JSON object with no line breaks within the object itself.

Example: Building a JSONL File with Python

Here's how to convert our Colorado capital request into a JSONL file for all 50 states:

import json

def create_state_capitals_jsonl(output_file="batch_requests.jsonl"):
    """Create a JSONL file with capital requests for all 50 US states."""

    states = [
        "Alabama", "Alaska", "Arizona", "Arkansas", "California",
        "Colorado", "Connecticut", "Delaware", "Florida", "Georgia",
        "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa",
        "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland",
        "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri",
        "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey",
        "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio",
        "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina",
        "South Dakota", "Tennessee", "Texas", "Utah", "Vermont",
        "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"
    ]

    with open(output_file, 'w') as f:
        for i, state in enumerate(states, 1):
            request = {
                "custom_id": f"state-{i}",
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": "gpt-4",
                    "messages": [
                        {"role": "user", "content": f"What is the capital of {state}?"}
                    ]
                }
            }
            f.write(json.dumps(request) + '\n')

    print(f"Created {output_file} with {len(states)} requests")

# Run the function
create_state_capitals_jsonl()

Result

This creates a JSONL file ready to be uploaded to the Batch API /v1/files endpoint. Here's what the first few lines look like:

{"custom_id": "state-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4", "messages": [{"role": "user", "content": "What is the capital of Alabama?"}]}} 
{"custom_id": "state-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4", "messages": [{"role": "user", "content": "What is the capital of Alaska?"}]}}
{"custom_id": "state-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4", "messages": [{"role": "user", "content": "What is the capital of Arizona?"}]}}

Common Pitfalls

When creating JSONL files, watch out for:

Line breaks within JSON objects: Each JSON object must be on a single line
Invalid JSON syntax: Validate your JSON before adding it to the file
Duplicate custom_id values: Each request must have a unique identifier