Chat API Quickstart

Parallel Chat is a web research API that returns OpenAI ChatCompletions compatible streaming text and JSON. The Chat API supports multiple models—from the speed model for low latency across a broad range of use cases, to research models (lite, base, core) for deeper research-grade outputs where you can afford to wait longer for even more comprehensive responses with full research basis support.

Beta Notice: Parallel Chat is in beta. We provide a rate limit of 300 requests per minute for the Chat API out of the box. Contact us for production capacity.

Choosing the Right Model

The Chat API supports both the speed model for low latency applications and research models for deeper outputs. Research models (lite, base, core) are Chat API wrappers over our Task API processors, providing the same research capabilities along with basis in an OpenAI-compatible interface.

Model	Best For	Basis Support	Latency (TTFT)
`speed`	Low latency across a broad range of use cases	No	~3s
`lite`	Simple lookups, basic metadata	Yes	10-60s
`base`	Standard enrichments, factual queries	Yes	15-100s
`core`	Complex research, multi-source synthesis	Yes	60s-5min

Use speed for low latency across a broad range of use cases. Use research models (lite, base, core) for more research-intensive workflows where you can afford to wait longer for an even deeper response with citations, reasoning, and confidence levels via the research basis.

Getting Started with the OpenAI SDK

To use the OpenAI SDK compatibility feature, you’ll need to:

Use an official OpenAI SDK
Make these changes:
- Update your base URL to point to Parallel’s beta API endpoint
- Replace your API key with a Parallel API key
- Update your model name to speed, lite, base, or core
Review the documentation below for supported features

Performance and Rate Limits

Speed is optimized for interactive applications requiring low latency responses:

Performance: With stream=true, achieves 3 second p50 TTFT (median time to first token)
Default Rate Limit: 300 requests per minute
Use Cases: Chat interfaces, interactive tools

For research based tasks where latency is not the primary concern, use one of the research models. For production deployments requiring higher rate limits, contact our team.

Example Execution

curl -N https://api.parallel.ai/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PARALLEL_API_KEY" \
  -d '{
    "model": "speed",
    "messages": [
      { "role": "user", "content": "What does Parallel Web Systems do?" }
    ],
    "stream": false,
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "reasoning_schema",
        "schema": {
          "type": "object",
          "properties": {
            "reasoning": {
              "type": "string",
              "description": "Think step by step to arrive at the answer"
            },
            "answer": {
              "type": "string",
              "description": "The direct answer to the question"
            },
            "citations": {
              "type": "array",
              "items": { "type": "string" },
              "description": "Sources cited to support the answer"
            }
          }
        }
      }
    }
  }'

System Prompt

You can provide a custom system prompt to control the AI’s behavior and response style by including it in the messages array with "role": "system" as the first message in your request.

Using Research Models

When you use research models (lite, base, or core) instead of speed, the Chat API provides research-grade outputs with full research basis support. The basis includes citations, reasoning, and confidence levels for each response.

Example with Research Model

curl -N https://api.parallel.ai/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PARALLEL_API_KEY" \
  -d '{
    "model": "base",
    "messages": [
      { "role": "user", "content": "What is the founding date and headquarters of Parallel Web Systems?" }
    ],
    "stream": false
  }'

For complete details on the research basis structure, including per-element basis for arrays, see the Basis documentation.

Web Tools

Web Agents

MCP

Integrations

Additional Resources

Chat API Quickstart

Choosing the Right Model

Getting Started with the OpenAI SDK

Performance and Rate Limits

Example Execution

System Prompt

Using Research Models

Example with Research Model

Web Tools

Web Agents

MCP

Integrations

Additional Resources

​Choosing the Right Model

​Getting Started with the OpenAI SDK

​Performance and Rate Limits

​Example Execution

​System Prompt

​Using Research Models

​Example with Research Model

Choosing the Right Model

Getting Started with the OpenAI SDK

Performance and Rate Limits

Example Execution

System Prompt

Using Research Models

Example with Research Model