Refresh Runs

Overview

Scheduled jobs allow you to run the same FindAll query on a regular basis to discover newly emerging entities and track changes to existing ones. This is ideal for ongoing monitoring use cases like market intelligence, lead generation, or competitive tracking. Rather than manually re-running queries, you can programmatically create new FindAll runs using a previous run’s schema, while excluding candidates you’ve already discovered.

Use Cases

Scheduled FindAll jobs are particularly useful for:

Market monitoring: Track new companies entering a market space over time
Lead generation: Continuously discover new potential customers matching your criteria
Competitive intelligence: Monitor emerging competitors and new funding announcements
Investment research: Track new companies meeting specific investment criteria
Regulatory compliance: Discover new entities that may require compliance review

How It Works

Creating a scheduled FindAll job involves two steps:

Retrieve the schema from a previous successful run
Create a new run using that schema, with an exclude list of previously discovered candidates

This approach ensures:

Consistent criteria: Use the exact same evaluation logic across runs
No duplicates: Automatically exclude candidates from previous runs
Cost efficiency: Only pay to evaluate net new candidates

Step 1: Retrieve the Schema

Get the schema from a completed FindAll run to reuse its entity_type, match_conditions, and enrichments:

cURL

curl -X GET "https://api.parallel.ai/v1beta/findall/runs/${FINDALL_ID}/schema" \
  -H "x-api-key: $PARALLEL_API_KEY" \
  -H "parallel-beta: findall-2025-09-15"

Response:

{
  "objective": "Find all portfolio companies of Khosla Ventures founded after 2020",
  "entity_type": "companies",
  "match_conditions": [
    {
      "name": "khosla_ventures_portfolio_check",
      "description": "Company must be a portfolio company of Khosla Ventures."
    },
    {
      "name": "founded_after_2020_check",
      "description": "Company must have been founded after 2020."
    }
  ],
  "enrichments": [
    {
      "name": "funding_amount",
      "description": "Total funding raised by the company in USD"
    }
  ],
  "generator": "core",
  "match_limit": 50
}

Step 2: Create a New Run with `exclude_list`

Use the retrieved schema to create a new FindAll run, adding an exclude_list parameter to skip candidates you’ve already discovered:

cURL

curl -X POST "https://api.parallel.ai/v1beta/findall/runs" \
  -H "x-api-key: $PARALLEL_API_KEY" \
  -H "parallel-beta: findall-2025-09-15" \
  -H "Content-Type: application/json" \
  -d '{
    "objective": "Find all portfolio companies of Khosla Ventures founded after 2020",
    "entity_type": "companies",
    "match_conditions": [
      {
        "name": "khosla_ventures_portfolio_check",
        "description": "Company must be a portfolio company of Khosla Ventures."
      },
      {
        "name": "founded_after_2020_check",
        "description": "Company must have been founded after 2020."
      }
    ],
    "enrichments": [
      {
        "name": "funding_amount",
        "description": "Total funding raised by the company in USD"
      }
    ],
    "generator": "core",
    "match_limit": 50,
    "exclude_list": [
      {
        "name": "Anthropic",
        "url": "https://www.anthropic.com/"
      },
      {
        "name": "Adept AI",
        "url": "https://adept.ai/"
      },
      {
        "name": "Liquid AI",
        "url": "https://www.liquid.ai/"
      }
    ]
  }'

Exclude List Parameters

The exclude_list is an array of candidate objects to exclude. Each object contains:

Parameter	Type	Required	Description
`name`	string	Yes	Name of the candidate to exclude
`url`	string	Yes	URL of the candidate to exclude

How exclusions work:

Candidates matching any entry in the exclude_list will be skipped during generation
This prevents re-evaluating entities you’ve already processed
Exclusions are matched by URL—ensure URLs are normalized consistently across runs

Building Your Exclude List

To construct the exclude_list from previous runs, retrieve the matched candidates and extract their name and url fields:

cURL

curl -X GET "https://api.parallel.ai/v1beta/findall/runs/${FINDALL_ID}/result" \
  -H "x-api-key: $PARALLEL_API_KEY" \
  -H "parallel-beta: findall-2025-09-15"

Extract the name and url fields from each matched candidate:

{
  "findall_id": "findall_40e0ab8c10754be0b7a16477abb38a2f",
  "matched_candidates": [
    {
      "candidate_id": "candidate_abc123",
      "name": "Anthropic",
      "url": "https://www.anthropic.com/",
      "match_status": "matched",
      ...
    },
    {
      "candidate_id": "candidate_def456",
      "name": "Adept AI",
      "url": "https://adept.ai/",
      "match_status": "matched",
      ...
    }
  ]
}

Store these candidates and pass them as the exclude_list array in subsequent runs.

Example: Weekly Scheduled Job

Here’s a complete example showing how to set up a weekly FindAll job:

import requests
import time
from datetime import datetime

PARALLEL_API_KEY = "your_api_key"
BASE_URL = "https://api.parallel.ai/v1beta"
HEADERS = {
    "x-api-key": PARALLEL_API_KEY,
    "parallel-beta": "findall-2025-09-15",
    "Content-Type": "application/json"
}

# Store the original findall_id from your first run
ORIGINAL_FINDALL_ID = "findall_40e0ab8c10754be0b7a16477abb38a2f"

# Keep track of all discovered candidates across runs
all_discovered_candidates = []

def get_schema(findall_id):
    """Retrieve schema from a previous run"""
    response = requests.get(
        f"{BASE_URL}/findall/runs/{findall_id}/schema",
        headers=HEADERS
    )
    response.raise_for_status()
    return response.json()

def get_matched_candidates(findall_id):
    """Get all matched candidates from a run"""
    response = requests.get(
        f"{BASE_URL}/findall/runs/{findall_id}/result",
        headers=HEADERS
    )
    response.raise_for_status()
    return response.json().get("matched_candidates", [])

def create_scheduled_run(schema, exclude_candidates):
    """Create a new FindAll run with exclusions"""
    payload = {
        **schema,
        "generator": "core",
        "match_limit": 50,
        "exclude_list": exclude_candidates
    }

    response = requests.post(
        f"{BASE_URL}/findall/runs",
        headers=HEADERS,
        json=payload
    )
    response.raise_for_status()
    return response.json()["findall_id"]

def run_weekly_job():
    """Execute a scheduled FindAll job"""
    print(f"Starting scheduled job at {datetime.now()}")

    # Step 1: Get schema from original run
    schema = get_schema(ORIGINAL_FINDALL_ID)
    print(f"Retrieved schema: {schema['objective']}")

    # Step 2: Create new run with exclusions
    new_findall_id = create_scheduled_run(schema, all_discovered_candidates)
    print(f"Created new run: {new_findall_id}")

    # Step 3: Poll for completion (simplified)
    while True:
        response = requests.get(
            f"{BASE_URL}/findall/runs/{new_findall_id}",
            headers=HEADERS
        )
        status = response.json()["status"]["status"]

        if status in ["completed", "failed", "cancelled"]:
            break

        time.sleep(30)  # Poll every 30 seconds

    # Step 4: Get new matched candidates
    new_candidates = get_matched_candidates(new_findall_id)
    print(f"Found {len(new_candidates)} new candidates")

    # Step 5: Update exclude list for next run
    for candidate in new_candidates:
        all_discovered_candidates.append({
            "name": candidate["name"],
            "url": candidate["url"]
        })

    return new_candidates

# Run the job
if __name__ == "__main__":
    new_results = run_weekly_job()

Best Practices

Schema Modifications

While you should keep match_conditions consistent across runs, you can adjust:

objective: Update to reflect the current time period (e.g., “founded in 2024” → “founded in 2025”)
enrichments: Add new enrichment fields without affecting matching logic
match_limit: Adjust based on expected growth rate
generator: Change generators if needed (though this may affect result quality)

Exclude List Management

Persist candidates: Store discovered candidate objects (name and URL) in a database or file for long-term tracking
Normalize URLs: Ensure consistent URL formatting (trailing slashes, protocols, etc.) across runs
Periodic resets: Consider occasionally running without exclusions to catch entities that may have changed
Monitor list size: Very large exclude lists (>10,000 candidates) may impact performance

Scheduling

Frequency: Choose intervals based on your domain’s update rate (daily, weekly, monthly)
Off-peak hours: Schedule jobs during low-traffic periods if possible
Webhooks: Use webhooks to get notified when jobs complete
Error handling: Implement retry logic for failed runs

Cost Optimization

Start small: Use lower match_limit values initially, then extend if needed
Preview first: Test schema changes with preview before running full jobs
Monitor metrics: Track generated_candidates_count vs matched_candidates_count to optimize criteria

Preview: Test queries with ~10 candidates before running full searches
Generators and Pricing: Understand generator options and pricing
Enrichments: Extract additional structured data for matched candidates
Extend Runs: Increase match limits without paying new fixed costs
Webhooks: Configure HTTP callbacks for run completion and matches
Streaming Events: Receive real-time updates via Server-Sent Events
Run Lifecycle: Understand run statuses and how to cancel runs
API Reference: Complete endpoint documentation

Web Tools

Web Agents

Integrations

Additional Resources

Overview

Use Cases

How It Works

Step 1: Retrieve the Schema

Step 2: Create a New Run with `exclude_list`

Exclude List Parameters

Building Your Exclude List

Example: Weekly Scheduled Job

Best Practices

Schema Modifications

Exclude List Management

Scheduling

Cost Optimization

Web Tools

Web Agents

Integrations

Additional Resources

​Overview

​Use Cases

​How It Works

​Step 1: Retrieve the Schema

​Step 2: Create a New Run with exclude_list

​Exclude List Parameters

​Building Your Exclude List

​Example: Weekly Scheduled Job

​Best Practices

​Schema Modifications

​Exclude List Management

​Scheduling

​Cost Optimization

​Related Topics

Overview

Use Cases

How It Works

Step 1: Retrieve the Schema

Step 2: Create a New Run with `exclude_list`

Exclude List Parameters

Building Your Exclude List

Example: Weekly Scheduled Job

Best Practices

Schema Modifications

Exclude List Management

Scheduling

Cost Optimization

Related Topics