Skip to main content
The Sup AI Chat Completions API is fully compatible with the OpenAI chat completions format, with additional features for multi-model consensus and intelligent mode selection.

Endpoint

POST https://api.sup.ai/v1/openai/chat/completions

Authentication

All requests require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Request Body

messages
array
required
An array of messages in the conversation. Must contain at least one non-system message and end with a user message.Each message is an object with a role and content:
RoleDescription
systemSets behavior and context for the AI
userMessages from the user
assistantPrevious AI responses
model
string
default:"auto"
The mode ID to use for generation. This is named “model”, which is confusing, but it’s the name of the field in the OpenAI API.
models
array | null
default:"null"
Specific model IDs to use. If null, all non-deprecated models are available for selection. Example model IDs:
  • anthropic/claude-sonnet-4.5
  • openai/gpt-5.2
  • google/gemini-3-flash
  • xai/grok-4
stream
boolean
default:"false"
Whether to stream the response using Server-Sent Events.
stream_options
object
Options for streaming responses.
environment
object
User environment context for personalized responses.
include_supai_chunks
boolean
default:"false"
Whether to include Sup AI-specific chunk data in the stream. Enables access to thinking tokens, web search results, confidence scores, and other advanced features.

Message Types

System Message

Provides instructions and context that guide the AI’s behavior throughout the conversation.
{ "role": "system", "content": "You are a helpful assistant specialized in Python programming." }
Or with structured content:
{ "role": "system", "content": [{ "type": "text", "text": "You are a helpful assistant." }] }

User Message

Messages from the user. Supports text and images. Text only:
{ "role": "user", "content": "Explain the difference between async and sync programming." }
With images:
{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } }
  ]
}

Assistant Message

Previous AI responses in the conversation history.
{ "role": "assistant", "content": "Async programming allows concurrent execution without blocking..." }

Response Format

Non-Streaming Response

When stream is false, the response is a JSON object:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705312200,
  "model": "auto",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "The capital of France is Paris." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 50, "completion_tokens": 150, "total_tokens": 200 }
}

Streaming Response

When stream is true, the response is a stream of Server-Sent Events (SSE).

Standard OpenAI Chunk

The primary chunk format, compatible with OpenAI clients:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1705312200,
  "model": "auto",
  "choices": [{ "index": 0, "delta": { "role": "assistant", "content": "The " }, "finish_reason": null }]
}
First chunk includes role:
{
  "choices": [{ "delta": { "role": "assistant" }, "finish_reason": null, "index": 0 }],
  "created": 1705312200,
  "id": "chatcmpl-abc123",
  "model": "auto",
  "object": "chat.completion.chunk"
}
Content chunks include content:
{
  "choices": [{ "delta": { "content": "capital" }, "finish_reason": null, "index": 0 }],
  "created": 1705312200,
  "id": "chatcmpl-abc123",
  "model": "auto",
  "object": "chat.completion.chunk"
}
Final chunk includes finish_reason:
{
  "choices": [{ "delta": {}, "finish_reason": "stop", "index": 0 }],
  "created": 1705312200,
  "id": "chatcmpl-abc123",
  "model": "auto",
  "object": "chat.completion.chunk",
  "usage": { "prompt_tokens": 50, "completion_tokens": 150, "total_tokens": 200 }
}
The usage field is only included when stream_options.include_usage is true.
Stream termination:
data: [DONE]

Sup AI-Specific Chunks

When include_supai_chunks is true, additional chunks provide insight into Sup AI’s multi-model orchestration. The main key is the orchestrator thread. All Sup AI chunks have object: "supai.chunk" and contain a chunk property with the following types:

Text Chunk

Generated text content from a model.
{
  "object": "supai.chunk",
  "chunk": {
    "type": "text",
    "text": "Hello, how can I help you today?",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}

Thinking Chunk

Internal reasoning from models with thinking capabilities.
{
  "object": "supai.chunk",
  "chunk": {
    "type": "thinking",
    "text": "Let me analyze the user's request step by step...",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}

Start/Done Chunks

Signal the beginning and end of a model’s generation.
{ "object": "supai.chunk", "chunk": { "type": "start", "_source": { "key": "main" } } }
{ "object": "supai.chunk", "chunk": { "type": "done", "_source": { "key": "main" } } }

Mode Selection Chunks

Indicate which mode and models are being used. Run Mode Call — Mode selection started:
{
  "object": "supai.chunk",
  "chunk": {
    "type": "run-mode-call",
    "modeId": "thinking",
    "startModelIds": ["anthropic/claude-sonnet-4.5", "openai/gpt-5.2", "google/gemini-3-flash"]
  }
}
Run Mode Result — Final models used:
{
  "object": "supai.chunk",
  "chunk": {
    "type": "run-mode-result",
    "finalModelIds": ["anthropic/claude-sonnet-4.5", "openai/gpt-5.2", "google/gemini-3-flash"]
  }
}

Consensus Chunk

Indicates consensus is being generated from multiple model outputs.
{ "object": "supai.chunk", "chunk": { "type": "run-consensus", "modelId": "anthropic/claude-sonnet-4.5" } }

Confidence Score Chunk

A confidence score for the generated response (0-1).
{ "object": "supai.chunk", "chunk": { "type": "confidence-score", "value": 0.92, "_source": { "key": "main" } } }

Web Search Chunks

Results from web search tool calls. Search initiated:
{
  "object": "supai.chunk",
  "chunk": {
    "type": "web-search-call",
    "query": "latest TypeScript features",
    "toolCallId": "call_abc123",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}
Search results:
{
  "object": "supai.chunk",
  "chunk": {
    "type": "web-search-result",
    "query": "latest TypeScript features",
    "results": [
      { "title": "TypeScript 5.4 Release Notes", "url": "https://..." },
      { "title": "What's New in TypeScript", "url": "https://..." }
    ],
    "toolCallId": "call_abc123",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}

URL Fetch Chunks

Results from fetching URL content. Fetch initiated:
{
  "object": "supai.chunk",
  "chunk": {
    "type": "fetch-url-call",
    "url": "https://example.com/article",
    "toolCallId": "call_def456",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}
Fetch result:
{
  "object": "supai.chunk",
  "chunk": {
    "type": "fetch-url-result",
    "contentType": "text",
    "title": "Example Article",
    "toolCallId": "call_def456",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}

Source Chunk

Citation sources referenced in the response.
{
  "object": "supai.chunk",
  "chunk": {
    "type": "source",
    "source": { "type": "url", "url": "https://docs.example.com/guide", "title": "Official Documentation" },
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}

Error Chunk

Indicates an error occurred during generation.
{
  "object": "supai.chunk",
  "chunk": {
    "type": "error",
    "message": "Rate limit exceeded",
    "retryAt": "2024-01-15T10:35:00Z",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.5" }
  }
}

Example: Basic Request

from openai import OpenAI

client = OpenAI(
  base_url="https://api.sup.ai/v1/openai",
  api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
  model="auto",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ]
)

print(response.choices[0].message.content)

Example: Streaming

from openai import OpenAI

client = OpenAI(
  base_url="https://api.sup.ai/v1/openai",
  api_key="YOUR_API_KEY"
)

stream = client.chat.completions.create(
  model="thinking",
  stream=True,
  messages=[
    {"role": "user", "content": "Write a haiku about programming."}
  ],
  extra_body={"include_supai_chunks": True}
)

for chunk in stream:
  # Check if this is a Sup AI chunk or standard OpenAI chunk
  if chunk.object == "supai.chunk":
    supai_chunk = chunk.chunk
    if supai_chunk["type"] == "thinking":
      print(f"[thinking] {supai_chunk['text']}")
    elif supai_chunk["type"] == "run-mode-call":
      print(f"[mode] {supai_chunk['modeId']} with {len(supai_chunk['startModelIds'])} models")
  elif chunk.object == "chat.completion.chunk":
    if chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="")

Error Responses

StatusDescription
400Bad request - invalid input parameters
401Unauthorized - missing or invalid API key
402Payment required - insufficient credits
500Internal server error

Best Practices

  • Use auto for most cases — it intelligently selects the optimal mode - Use fast for simple, low-stakes tasks - Use thinking for typical development work - Use deep-thinking for complex architectural decisions - Use pro for high-stakes, regulatory, or compliance work
Enable stream: true for better user experience in interactive applications. The response starts immediately rather than waiting for full generation.
Set environment.date and environment.location for time-sensitive or location-aware responses. This helps models provide more relevant information.
System messages set the AI’s behavior for the entire conversation. Be specific about the assistant’s role, expertise, and any constraints.