Model Pricing | Batch API | Doubleword Docs

Doubleword Batch API is priced per model based on token usage. Costs are calculated separately for input tokens (the content you send) and output tokens (the content generated by the model).

The table below outlines pricing on current models we have available. If you are interested in understanding pricing for a model not listed below - please reach out to support@doubleword.ai.

Model Name	SLA	Input Tokens (per 1M)	Output Tokens (per 1M)
Qwen/Qwen3-Embedding-8B	Realtime¹	$0.04	$0.00
Qwen/Qwen3-Embedding-8B	1hr	$0.03	$0.00
Qwen/Qwen3-Embedding-8B	24hr	$0.02	$0.00
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8	Realtime¹	$0.16	$0.80
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8	1hr	$0.07	$0.30
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8	24hr	$0.05	$0.20
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8	Realtime¹	$0.60	$1.20
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8	1hr	$0.15	$0.55
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8	24hr	$0.10	$0.40

If you'd like to estimate the cost of your job, please upload your file in the Doubleword Console to view a cost estimate prior to submitting a batch.

Note

SLA indicates the maximum processing time for batch requests. Actual processing times are typically faster than the stated SLA.

Model Details

Qwen/Qwen3-Embedding-8B

Playground

The Qwen3 Embedding model series is the latest embeddings model in the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

Exceptional Versatility: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks No.1 in the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58), while the reranking model excels in various text retrieval scenarios.

Comprehensive Flexibility: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.

Multilingual Capability: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.

Qwen3-Embedding-8B has the following features:

Model Type: Text Embedding
Supported Languages: 100+ Languages
Number of Paramaters: 8B
Context Length: 32k
Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 32 to 4096

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub.

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

Playground

Meet Qwen3-VL-30B, the smaller model of the Qwen3-VL family, delivering performance similar to GPT-4.1-mini and Claude Sonnet 4. This highly capable mid-size model is suited for tasks that are constrained or require high token volumes. Excels at reasoning, coding, and structured output generation.

Best for:

Production workloads requiring strong performance without frontier model costs
Complex reasoning tasks
Code generation

Qwen/Qwen3-VL-235B-A22B-Instruct-FP8

Playground

Meet Qwen3-VL-235B - our most powerful model, delivering performance similar to GPT-5 Chat and Claude 4 Opus Thinking on challenging tasks including advanced reasoning, mathematics, and complex code generation. Offers frontier-level capabilities at a fraction of the cost. Best for:

Tasks requiring maximum intelligence
Complex analysis
Sophisticated coding projects
Scenarios where quality justifies the additional cost over smaller models

Max New Tokens: 16384

Max Total Tokens: 262144

Sampling Parameters:

We have set the default sampling parameters using the recommended values set out by the Qwen team:

We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.

For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.

We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.

You can adjust these on a per-request basis by setting the sampling parameters in the request body.

Realtime availability is limited. Doubleword is primarily a batch API. ↩ ↩² ↩³

Model Details

Qwen/Qwen3-Embedding-8B

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

Qwen/Qwen3-VL-235B-A22B-Instruct-FP8

Footnotes