Create AI-powered chat completions using multiple models and consensus-driven responses.
The Sup AI Chat Completions API is fully compatible with the OpenAI chat completions format, with additional features for multi-model consensus and intelligent mode selection.
An array of messages in the conversation. Must contain at least one non-system message and end with a user message.Each message is an object with a role and content:
Whether to include Sup AI-specific chunk data in the stream. Enables access to thinking tokens, web search results,
confidence scores, and other advanced features.
When include_supai_chunks is true, additional chunks provide insight into Sup AI’s multi-model orchestration. The main key is the orchestrator thread.All Sup AI chunks have object: "supai.chunk" and contain a chunk property with the following types:
{ "object": "supai.chunk", "chunk": { "type": "text", "text": "Hello, how can I help you today?", "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" } }}
from openai import OpenAIclient = OpenAI( base_url="https://api.sup.ai/v1/openai", api_key="YOUR_API_KEY")response = client.chat.completions.create( model="auto", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ])print(response.choices[0].message.content)
from openai import OpenAIclient = OpenAI( base_url="https://api.sup.ai/v1/openai", api_key="YOUR_API_KEY")stream = client.chat.completions.create( model="thinking", stream=True, messages=[ {"role": "user", "content": "Write a haiku about programming."} ], extra_body={"include_supai_chunks": True})for chunk in stream: # Check if this is a Sup AI chunk or standard OpenAI chunk if chunk.object == "supai.chunk": supai_chunk = chunk.chunk if supai_chunk["type"] == "thinking": print(f"[thinking] {supai_chunk['text']}") elif supai_chunk["type"] == "run-mode-call": print(f"[mode] {supai_chunk['modeId']} with {len(supai_chunk['startModelIds'])} models") elif chunk.object == "chat.completion.chunk": if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")
Use auto for most cases — it intelligently selects the optimal mode - Use fast for simple, low-stakes tasks -
Use thinking for typical development work - Use deep-thinking for complex architectural decisions - Use pro
for high-stakes, regulatory, or compliance work
Optimize for streaming
Enable stream: true for better user experience in interactive applications. The response starts immediately rather
than waiting for full generation.
Provide context with environment
Set environment.date and environment.location for time-sensitive or location-aware responses. This helps models
provide more relevant information.
Use system messages effectively
System messages set the AI’s behavior for the entire conversation. Be specific about the assistant’s role,
expertise, and any constraints.