Search API
Retrieval engine purpose-built for AI agents
Overview
The Parallel Search API streamlines the traditional search → scrape → extract pipeline into a single, low-latency API—reducing token overhead and integration effort.
Designed for agentic systems and LLM-based workflows, the API returns ranked, compressed excerpts optimized for LLMs. Results are designed to serve directly as model input, enabling faster reasoning and higher-quality completions with less post-processing.
Alpha Notice: This API is currently in alpha and subject to change. Usage is limited to 1,000 requests per hour. For production access or higher capacity, contact support@parallel.ai.
Key Benefits
The Search API is ideal for LLM workflows, agents and retrieval-augmented tasks that use web information.
- Speed: Replace brittle multi-step pipelines with a single API call that reduces latency while improving overall quality.
- Flexibility: Configure the number of results and control excerpt length using tunable parameters.
- Token Efficiency: Returns compressed, structured text from the web that minimizes tokens and eliminates the need for post-processing or additional prompting.
- Quality: Built on a web-scale index with advanced ranking and compression techniques that prioritize relevance, clarity, and source reliability.
- Simple Integration: Drop-in replacement for existing keyword-based workflows—just send your search keywords as is to the API and receive structured results.
Processors
Processors control how each search query is executed. They determine how results are retrieved, ranked, and compressed, and each has different performance and quality characteristics. Pricing per request is based on the processor used—independent of the number of inputs or outputs. Each query must specify exactly one processor.
Choosing a Processor
The Search API currently supports two processors, each optimized for different workloads.
Use base
for fast responses to general web queries, ideal for latency-sensitive applications.
Use pro
for more complex or open-ended queries where freshness, and content quality are critical.
Reference
Each processor varies in its performance characteristics and optimal use cases. Use the table below for reference.
Processor | Strengths | p90 Latency | Cost ($/1000) | Max Results Supported |
---|---|---|---|---|
Base | Lowest cost and fastest retrieval | 4-5s | 4 | 40 |
Pro | Excerpt quality, freshness | 45-70s | 9 | 40 |
Request Fields
Note that at least one of objective
or search_queries
is required. The remaining fields are optional.
Field | Type | Notes | Example |
---|---|---|---|
objective | string | Natural-language description of what the web research goal is. Include any source or freshness guidance. | ”I want to know when the UN was founded. Prefer UN’s websites.” |
search_queries | string[] | Optional search queries to guide the search. | [“Founding year UN”, “Year of founding United Nations”] |
processor | enum | Either base or pro | base |
max_results | int | Maximum number of search results | 10 |
max_chars_per_result | int | Maximum characters per search result | 1500 |
There are a few restriction on allowed values:
objective
: limited to 5,000 characters.search_queries
: Maximum number of queries allowed is 5 and each query can have a maximum length of 200 characters.max_results
: Processor defined limits. See reference.max_chars_per_result
: Minimum value allowed is 100. Per-result excerpt length greater than 30,000 characters in length are not guaranteed.