Skip to main content
Parallel Chat is a web research API that returns OpenAI ChatCompletions compatible streaming text and JSON. The Chat API supports multiple models—from the speed model for low latency across a broad range of use cases, to research models (lite, base, core) for deeper research-grade outputs where you can afford to wait longer for even more comprehensive responses with full research basis support.
Beta Notice: Parallel Chat is in beta. We provide a rate limit of 300 requests per minute for the Chat API out of the box. Contact us for production capacity.

Choosing the Right Model

The Chat API supports both the speed model for low latency applications and research models for deeper outputs. Research models (lite, base, core) are Chat API wrappers over our Task API processors, providing the same research capabilities along with basis in an OpenAI-compatible interface.
ModelBest ForBasis SupportLatency (TTFT)
speedLow latency across a broad range of use casesNo~3s
liteSimple lookups, basic metadataYes10-60s
baseStandard enrichments, factual queriesYes15-100s
coreComplex research, multi-source synthesisYes60s-5min
Use speed for low latency across a broad range of use cases. Use research models (lite, base, core) for more research-intensive workflows where you can afford to wait longer for an even deeper response with citations, reasoning, and confidence levels via the research basis.

1. Set up Prerequisites

The Chat API is fully compatible with the OpenAI SDK — just swap the base URL and API key. Generate your API key on Platform, then install the OpenAI SDK:
pip install openai
export PARALLEL_API_KEY="your-api-key"

Performance and Rate Limits

Speed is optimized for interactive applications requiring low latency responses:
  • Performance: With stream=true, achieves 3 second p50 TTFT (median time to first token)
  • Default Rate Limit: 300 requests per minute
  • Use Cases: Chat interfaces, interactive tools
For research based tasks where latency is not the primary concern, use one of the research models. For production deployments requiring higher rate limits, contact our team.

2. Make Your First Request

curl -N https://api.parallel.ai/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PARALLEL_API_KEY" \
  -d '{
    "model": "speed",
    "messages": [
      { "role": "user", "content": "What does Parallel Web Systems do?" }
    ],
    "stream": false,
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "reasoning_schema",
        "schema": {
          "type": "object",
          "properties": {
            "reasoning": {
              "type": "string",
              "description": "Think step by step to arrive at the answer"
            },
            "answer": {
              "type": "string",
              "description": "The direct answer to the question"
            },
            "citations": {
              "type": "array",
              "items": { "type": "string" },
              "description": "Sources cited to support the answer"
            }
          }
        }
      }
    }
  }'

System Prompt

You can provide a custom system prompt to control the AI’s behavior and response style by including it in the messages array with "role": "system" as the first message in your request.

Using Research Models

When you use research models (lite, base, or core) instead of speed, the Chat API provides research-grade outputs with full research basis support. The basis includes citations, reasoning, and confidence levels for each response.

Example with Research Model

curl -N https://api.parallel.ai/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PARALLEL_API_KEY" \
  -d '{
    "model": "base",
    "messages": [
      { "role": "user", "content": "What is the founding date and headquarters of Parallel Web Systems?" }
    ],
    "stream": false
  }'
For complete details on the research basis structure, including per-element basis for arrays, see the Basis documentation.

OpenAI SDK Compatibility

Research Basis via OpenAI SDK: When using task processors (lite, base, core) with the Chat API, the response includes a basis field with citations, reasoning, and confidence levels. Access it via response.basis in Python or (response as any).basis in TypeScript. See Basis documentation for details.

Important OpenAI Compatibility Limitations

API Behavior

Here are the most substantial differences from using OpenAI:
  • Multimodal input (images/audio) is not supported and will be ignored.
  • Prompt caching is not supported.
  • Most unsupported fields are silently ignored rather than producing errors. These are all documented below.

Detailed OpenAI Compatible API Support

Request Fields

Simple Fields
FieldSupport Status
modelUse speed, lite, base, or core
response_formatFully supported
streamFully supported
max_tokensIgnored
max_completion_tokensIgnored
stream_optionsIgnored
top_pIgnored
parallel_tool_callsIgnored
stopIgnored
temperatureIgnored
nIgnored
logprobsIgnored
metadataIgnored
predictionIgnored
presence_penaltyIgnored
frequency_penaltyIgnored
seedIgnored
service_tierIgnored
audioIgnored
logit_biasIgnored
storeIgnored
userIgnored
modalitiesIgnored
top_logprobsIgnored
reasoning_effortIgnored
Tools / Functions Fields
Tools are ignored.
Messages Array Fields
FieldSupport Status
messages[].roleFully supported
messages[].contentString only
messages[].nameFully supported
messages[].tool_callsIgnored
messages[].tool_call_idIgnored
messages[].function_callIgnored
messages[].audioIgnored
messages[].modalitiesIgnored
The content field only supports string values. Structured content arrays (e.g., for multimodal inputs with text and image parts) are not supported.

Response Fields

FieldSupport Status
idAlways empty
choices[]Will always have a length of 1
choices[].finish_reasonAlways empty
choices[].indexFully supported
choices[].message.roleFully supported
choices[].message.contentFully supported
choices[].message.tool_callsAlways empty
objectAlways empty
createdFully supported
modelAlways empty
finish_reasonAlways empty
contentFully supported
usage.completion_tokensAlways empty
usage.prompt_tokensAlways empty
usage.total_tokensAlways empty
usage.completion_tokens_detailsAlways empty
usage.prompt_tokens_detailsAlways empty
choices[].message.refusalAlways empty
choices[].message.audioAlways empty
logprobsAlways empty
service_tierAlways empty
system_fingerprintAlways empty
Parallel-Specific Response Fields
The following fields are Parallel extensions not present in the OpenAI API:
FieldSupport Status
basisSupported with task processors (lite, base, core)

Error Message Compatibility

The compatibility layer maintains approximately the same error formats as the OpenAI API.

Header Compatibility

While the OpenAI SDK automatically manages headers, here is the complete list of headers supported by Parallel’s API for developers who need to work with them directly.
FieldSupport Status
authorizationFully supported
x-ratelimit-limit-requestsIgnored
x-ratelimit-limit-tokensIgnored
x-ratelimit-remaining-requestsIgnored
x-ratelimit-remaining-tokensIgnored
x-ratelimit-reset-requestsIgnored
x-ratelimit-reset-tokensIgnored
retry-afterIgnored
x-request-idIgnored
openai-versionIgnored
openai-processing-msIgnored