Beta Notice: The Parallel Chat API is in beta. We provide a rate limit of 300 requests per minute for the Chat API out of the box. Contact us for production capacity.
Getting Started with the OpenAI SDK
To use the OpenAI SDK compatibility feature, you’ll need to:- Use an official OpenAI SDK
- Make these changes:
- Update your base URL to point to Parallel’s beta API endpoint
- Replace your API key with a Parallel API key
- Update your model name to “speed”
- Review the documentation below for supported features
Performance and Rate Limits
Speed is optimized for interactive applications requiring low latency responses:- Performance: With
stream=true
, achieves 3 second p50 TTFT (median time to first token) - Default Rate Limit: 300 requests per minute
- Use Cases: Chat interfaces, interactive tools