Build low-latency web research applications with OpenAI-compatible streaming chat completions
Parallel Chat is a web research API that returns OpenAI ChatCompletions compatible streaming text and JSON.
The Chat API supports multiple models—from the speed model for low latency across a
broad range of use cases, to research models (lite, base, core) for deeper research-grade outputs
where you can afford to wait longer for even more comprehensive responses with full research basis support.
Beta Notice: Parallel Chat is in beta. We provide a rate limit of 300 requests
per minute for the Chat API out of the box. Contact us
for production capacity.
The Chat API supports both the speed model for
low latency applications and research models for deeper outputs.
Research models (lite, base, core) are Chat API wrappers over our Task API processors,
providing the same research capabilities along with basis in an OpenAI-compatible interface.
Model
Best For
Basis Support
Latency (TTFT)
speed
Low latency across a broad range of use cases
No
~3s
lite
Simple lookups, basic metadata
Yes
10-60s
base
Standard enrichments, factual queries
Yes
15-100s
core
Complex research, multi-source synthesis
Yes
60s-5min
Use speed for low latency across a broad range of use cases.
Use research models (lite, base, core) for more research-intensive workflows
where you can afford to wait longer for an even deeper response with citations,
reasoning, and confidence levels via the research basis.
The Chat API is fully compatible with the OpenAI SDK — just swap the base URL and API key. Generate your API key on Platform, then install the OpenAI SDK:
Speed is optimized for interactive applications requiring low latency responses:
Performance: With stream=true, achieves 3 second p50 TTFT (median time to first token)
Default Rate Limit: 300 requests per minute
Use Cases: Chat interfaces, interactive tools
For research based tasks where latency is not the primary concern, use one of the research models.For production deployments requiring higher rate limits, contact our team.
You can provide a custom system prompt to control the AI’s behavior and response style by including it in the messages array with "role": "system" as the first message in your request.
When you use research models (lite, base, or core) instead of speed, the
Chat API provides research-grade outputs with
full research basis support.
The basis includes citations, reasoning, and confidence levels for each response.
Research Basis via OpenAI SDK: When using task processors (lite, base, core) with the Chat API, the response includes a basis field with citations, reasoning, and confidence levels. Access it via response.basis in Python or (response as any).basis in TypeScript. See Basis documentation for details.
While the OpenAI SDK automatically manages headers, here is the complete list of headers supported by Parallel’s API for developers who need to work with them directly.