LlamaIndex | Doubleword Inference API

The llamaindex-doubleword package provides Doubleword LLM and embedding models for LlamaIndex, with both real-time and batch variants.

Install

pip install llamaindex-doubleword

Chat / Completions

from llamaindex_doubleword import DoublewordLLM

llm = DoublewordLLM(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)

response = llm.complete("Say hello.")
print(response.text)

Tool calling

DoublewordLLM supports function calling, so you can use it with LlamaIndex's agent framework. The model decides when to call tools, receives the results, and formulates a final answer:

from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from llamaindex_doubleword import DoublewordLLM

def calculator(expression: str) -> str:
    """Evaluate a basic arithmetic expression."""
    return str(eval(expression, {"__builtins__": {}}, {}))

llm = DoublewordLLM(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)

agent = AgentWorkflow.from_tools_or_functions(
    [FunctionTool.from_defaults(fn=calculator)],
    llm=llm,
)

response = agent.run("What is 137 * 49?")
print(response)

The agent runs a model→tool→model loop until the model produces a final answer without requesting any more tool calls.

Embeddings

from llamaindex_doubleword import DoublewordEmbedding

embed_model = DoublewordEmbedding(
    model_name="Qwen/Qwen3-Embedding-8B",
    api_key="{{apiKey}}",
)

embedding = embed_model.get_text_embedding("Hello world")

Batch pricing with Autobatcher

For background tasks where latency is not critical, use the batch variants to transparently route requests through the Batch API at reduced cost:

pip install llamaindex-doubleword autobatcher

from llamaindex_doubleword import DoublewordLLMBatch
import asyncio

llm = DoublewordLLMBatch(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)

async def main():
    response = await llm.acomplete("Say hello.")
    print(response.text)

asyncio.run(main())

The batch variants (DoublewordLLMBatch, DoublewordEmbeddingBatch) are async-only. They collect concurrent requests and submit them as batch jobs automatically, cutting inference costs by up to 90%.

Using with LlamaIndex

from llama_index.core import Settings, VectorStoreIndex

Settings.llm = DoublewordLLM(model="{{selectedModel.id}}", api_key="{{apiKey}}")
Settings.embed_model = DoublewordEmbedding(model_name="Qwen/Qwen3-Embedding-8B", api_key="{{apiKey}}")

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is this about?")

Try it end-to-end

A full tool-calling agent example lives in the repo at examples/agent-basic/. It runs a calculator agent against concurrent arithmetic queries, with real-time and batched variants side by side.