Chat Completions - Sup AI Docs

The Sup AI Chat Completions API is fully compatible with the OpenAI chat completions format, with additional features for multi-model consensus and intelligent mode selection.

Endpoint

POST https://api.sup.ai/v1/openai/chat/completions

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Request Body

messages

array

required

An array of messages in the conversation. Must contain at least one non-system message and end with a user message.Each message is an object with a role and content:

Role	Description
`system`	Sets behavior and context for the AI
`user`	Messages from the user
`assistant`	Previous AI responses

model

string

default:"auto"

The mode ID to use for generation. This is named “model”, which is confusing, but it’s the name of the field in the OpenAI API.

models

array | null

default:"null"

Specific model IDs to use. If null, all active models are available for selection. Example model IDs:

anthropic/claude-sonnet-4.6
openai/gpt-5.2
google/gemini-3-flash
xai/grok-4

stream

boolean

default:"false"

Whether to stream the response using Server-Sent Events.

stream_options

object

Options for streaming responses.

Show properties

include_usage

boolean

default:"false"

Whether to include token usage in the final stream chunk.

environment

object

User environment context for personalized responses.

Show properties

date

string (ISO 8601)

The current date/time for the user. Example: 2024-01-15T10:30:00Z

user_name

string

The name of the user making the request. Example: John Doe

location

object

User location - either by IP address or explicit coordinates.Option 1: IP-based location

{ "ip_address": "current" }

{ "ip_address": "127.0.0.1" }

Option 2: Explicit location

{
  "city": "San Francisco",
  "region": "California",
  "country": "United States",
  "continent": "North America",
  "latitude": 37.7749,
  "longitude": -122.4194,
  "time_zone": "America/Los_Angeles",
  "postal_code": "94105"
}

include_supai_chunks

boolean

default:"false"

Whether to include Sup AI-specific chunk data in the stream. Enables access to thinking tokens, web search results, confidence scores, and other advanced features.

Message Types

System Message

Provides instructions and context that guide the AI’s behavior throughout the conversation.

{ "role": "system", "content": "You are a helpful assistant specialized in Python programming." }

Or with structured content:

{ "role": "system", "content": [{ "type": "text", "text": "You are a helpful assistant." }] }

User Message

Messages from the user. Supports text and images. Text only:

{ "role": "user", "content": "Explain the difference between async and sync programming." }

With images:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } }
  ]
}

Assistant Message

Previous AI responses in the conversation history.

{ "role": "assistant", "content": "Async programming allows concurrent execution without blocking..." }

Response Format

Non-Streaming Response

When stream is false, the response is a JSON object:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705312200,
  "model": "auto",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "The capital of France is Paris." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 50, "completion_tokens": 150, "total_tokens": 200 }
}

Streaming Response

When stream is true, the response is a stream of Server-Sent Events (SSE).

Standard OpenAI Chunk

The primary chunk format, compatible with OpenAI clients:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1705312200,
  "model": "auto",
  "choices": [{ "index": 0, "delta": { "role": "assistant", "content": "The " }, "finish_reason": null }]
}

First chunk includes role:

{
  "choices": [{ "delta": { "role": "assistant" }, "finish_reason": null, "index": 0 }],
  "created": 1705312200,
  "id": "chatcmpl-abc123",
  "model": "auto",
  "object": "chat.completion.chunk"
}

Content chunks include content:

{
  "choices": [{ "delta": { "content": "capital" }, "finish_reason": null, "index": 0 }],
  "created": 1705312200,
  "id": "chatcmpl-abc123",
  "model": "auto",
  "object": "chat.completion.chunk"
}

Final chunk includes finish_reason:

{
  "choices": [{ "delta": {}, "finish_reason": "stop", "index": 0 }],
  "created": 1705312200,
  "id": "chatcmpl-abc123",
  "model": "auto",
  "object": "chat.completion.chunk",
  "usage": { "prompt_tokens": 50, "completion_tokens": 150, "total_tokens": 200 }
}

The usage field is only included when stream_options.include_usage is true.

Stream termination:

data: [DONE]

Sup AI-Specific Chunks

When include_supai_chunks is true, additional chunks provide insight into Sup AI’s multi-model orchestration. The main key is the orchestrator thread. All Sup AI chunks have object: "supai.chunk" and contain a chunk property with the following types:

Text Chunk

Generated text content from a model.

{
  "object": "supai.chunk",
  "chunk": {
    "type": "text",
    "text": "Hello, how can I help you today?",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

Thinking Chunk

Internal reasoning from models with thinking capabilities.

{
  "object": "supai.chunk",
  "chunk": {
    "type": "thinking",
    "text": "Let me analyze the user's request step by step...",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

Start/Done Chunks

Signal the beginning and end of a model’s generation.

{ "object": "supai.chunk", "chunk": { "type": "start", "_source": { "key": "main" } } }

{ "object": "supai.chunk", "chunk": { "type": "done", "_source": { "key": "main" } } }

Mode Selection Chunks

Indicate which mode and models are being used. Run Mode Call — Mode selection started:

{
  "object": "supai.chunk",
  "chunk": {
    "type": "run-mode-call",
    "modeId": "thinking",
    "startModels": [
      { "modelId": "anthropic/claude-sonnet-4.6", "thinkingEffort": "medium" },
      { "modelId": "openai/gpt-5.2", "thinkingEffort": "medium" },
      { "modelId": "google/gemini-3-flash", "thinkingEffort": "medium" }
    ]
  }
}

Run Mode Result — Final models used:

{
  "object": "supai.chunk",
  "chunk": {
    "type": "run-mode-result",
    "finalModels": [
      { "modelId": "anthropic/claude-sonnet-4.6", "thinkingEffort": "medium" },
      { "modelId": "openai/gpt-5.2", "thinkingEffort": "medium" },
      { "modelId": "google/gemini-3-flash", "thinkingEffort": "medium" }
    ]
  }
}

Consensus Chunk

Indicates consensus is being generated from multiple model outputs.

{ "object": "supai.chunk", "chunk": { "type": "run-consensus", "modelId": "anthropic/claude-sonnet-4.6" } }

Confidence Score Chunk

A confidence score for the generated response (0-1).

{ "object": "supai.chunk", "chunk": { "type": "confidence-score", "value": 0.92, "_source": { "key": "main" } } }

Web Search Chunks

Results from web search tool calls. Search initiated:

{
  "object": "supai.chunk",
  "chunk": {
    "type": "web-search-call",
    "query": "latest TypeScript features",
    "toolCallId": "call_abc123",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

Search results:

{
  "object": "supai.chunk",
  "chunk": {
    "type": "web-search-result",
    "query": "latest TypeScript features",
    "results": [
      { "title": "TypeScript 5.4 Release Notes", "url": "https://..." },
      { "title": "What's New in TypeScript", "url": "https://..." }
    ],
    "toolCallId": "call_abc123",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

URL Fetch Chunks

Results from fetching URL content. Fetch initiated:

{
  "object": "supai.chunk",
  "chunk": {
    "type": "fetch-url-call",
    "url": "https://example.com/article",
    "toolCallId": "call_def456",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

Fetch result:

{
  "object": "supai.chunk",
  "chunk": {
    "type": "fetch-url-result",
    "contentType": "text",
    "title": "Example Article",
    "toolCallId": "call_def456",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

Source Chunk

Citation sources referenced in the response.

{
  "object": "supai.chunk",
  "chunk": {
    "type": "source",
    "source": { "type": "url", "url": "https://docs.example.com/guide", "title": "Official Documentation" },
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

Error Chunk

Indicates an error occurred during generation.

{
  "object": "supai.chunk",
  "chunk": {
    "type": "error",
    "message": "Rate limit exceeded",
    "retryAt": "2024-01-15T10:35:00Z",
    "_source": { "key": "main", "modelId": "anthropic/claude-sonnet-4.6" }
  }
}

Example: Basic Request

from openai import OpenAI

client = OpenAI(
  base_url="https://api.sup.ai/v1/openai",
  api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
  model="auto",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ]
)

print(response.choices[0].message.content)

Example: Streaming

from openai import OpenAI

client = OpenAI(
  base_url="https://api.sup.ai/v1/openai",
  api_key="YOUR_API_KEY"
)

stream = client.chat.completions.create(
  model="thinking",
  stream=True,
  messages=[
    {"role": "user", "content": "Write a haiku about programming."}
  ],
  extra_body={"include_supai_chunks": True}
)

for chunk in stream:
  # Check if this is a Sup AI chunk or standard OpenAI chunk
  if chunk.object == "supai.chunk":
    supai_chunk = chunk.chunk
    if supai_chunk["type"] == "thinking":
      print(f"[thinking] {supai_chunk['text']}")
    elif supai_chunk["type"] == "run-mode-call":
      print(f"[mode] {supai_chunk['modeId']} with {len(supai_chunk['startModels'])} models")
  elif chunk.object == "chat.completion.chunk":
    if chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="")

Error Responses

Status	Description
`400`	Bad request - invalid input parameters
`401`	Unauthorized - missing or invalid API key
`402`	Payment required - insufficient credits
`500`	Internal server error

Best Practices

Choose the right mode

Use auto for most cases — it intelligently selects the optimal mode - Use fast for simple, low-stakes tasks - Use thinking for typical development work - Use deep-thinking for complex architectural decisions - Use pro for high-stakes, regulatory, or compliance work

Optimize for streaming

Enable stream: true for better user experience in interactive applications. The response starts immediately rather than waiting for full generation.

Provide context with environment

Set environment.date and environment.location for time-sensitive or location-aware responses. This helps models provide more relevant information.

Use system messages effectively

System messages set the AI’s behavior for the entire conversation. Be specific about the assistant’s role, expertise, and any constraints.

Getting Started

API

​Endpoint

​Authentication

​Request Body

​Message Types

​System Message

​User Message

​Assistant Message

​Response Format

​Non-Streaming Response

​Streaming Response

​Standard OpenAI Chunk

​Sup AI-Specific Chunks

​Text Chunk

​Thinking Chunk

​Start/Done Chunks

​Mode Selection Chunks

​Consensus Chunk

​Confidence Score Chunk

​Web Search Chunks

​URL Fetch Chunks

​Source Chunk

​Error Chunk

​Example: Basic Request

​Example: Streaming

​Error Responses

​Best Practices

Endpoint

Authentication

Request Body

Message Types

System Message

User Message

Assistant Message

Response Format

Non-Streaming Response

Streaming Response

Standard OpenAI Chunk

Sup AI-Specific Chunks

Text Chunk

Thinking Chunk

Start/Done Chunks

Mode Selection Chunks

Consensus Chunk

Confidence Score Chunk

Web Search Chunks

URL Fetch Chunks

Source Chunk

Error Chunk

Example: Basic Request

Example: Streaming

Error Responses

Best Practices