LlamaIndex
The llamaindex-doubleword package provides Doubleword LLM and embedding models for LlamaIndex, with both real-time and batch variants.
Install
pip install llamaindex-doublewordChat / Completions
from llamaindex_doubleword import DoublewordLLM
llm = DoublewordLLM(
model="{{selectedModel.id}}",
api_key="{{apiKey}}",
)
response = llm.complete("Say hello.")
print(response.text)Tool calling
DoublewordLLM supports function calling, so you can use it with LlamaIndex's agent framework. The model decides when to call tools, receives the results, and formulates a final answer:
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from llamaindex_doubleword import DoublewordLLM
def calculator(expression: str) -> str:
"""Evaluate a basic arithmetic expression."""
return str(eval(expression, {"__builtins__": {}}, {}))
llm = DoublewordLLM(
model="{{selectedModel.id}}",
api_key="{{apiKey}}",
)
agent = AgentWorkflow.from_tools_or_functions(
[FunctionTool.from_defaults(fn=calculator)],
llm=llm,
)
response = agent.run("What is 137 * 49?")
print(response)The agent runs a model→tool→model loop until the model produces a final answer without requesting any more tool calls.
Embeddings
from llamaindex_doubleword import DoublewordEmbedding
embed_model = DoublewordEmbedding(
model_name="Qwen/Qwen3-Embedding-8B",
api_key="{{apiKey}}",
)
embedding = embed_model.get_text_embedding("Hello world")Batch pricing with Autobatcher
For background tasks where latency is not critical, use the batch variants to transparently route requests through the Batch API at reduced cost:
pip install llamaindex-doubleword autobatcherfrom llamaindex_doubleword import DoublewordLLMBatch
import asyncio
llm = DoublewordLLMBatch(
model="{{selectedModel.id}}",
api_key="{{apiKey}}",
)
async def main():
response = await llm.acomplete("Say hello.")
print(response.text)
asyncio.run(main())The batch variants (DoublewordLLMBatch, DoublewordEmbeddingBatch) are async-only. They collect concurrent requests and submit them as batch jobs automatically, cutting inference costs by up to 90%.
Using with LlamaIndex
from llama_index.core import Settings, VectorStoreIndex
Settings.llm = DoublewordLLM(model="{{selectedModel.id}}", api_key="{{apiKey}}")
Settings.embed_model = DoublewordEmbedding(model_name="Qwen/Qwen3-Embedding-8B", api_key="{{apiKey}}")
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is this about?")Try it end-to-end
A full tool-calling agent example lives in the repo at examples/agent-basic/. It runs a calculator agent against concurrent arithmetic queries, with real-time and batched variants side by side.