Features
- SQL-Native: Use
parallel_enrich()directly in BigQuery SQL queries - Secure: API key stored in Secret Manager, accessed via Cloud Functions
- Configurable Processors: Choose from lite-fast to ultra for speed vs thoroughness tradeoffs
- Structured Output: Returns JSON that can be parsed with BigQuery’s
JSON_EXTRACT_SCALAR()
Installation
The standalone
parallel-cli binary does not include deployment commands. You must install via pip to deploy the BigQuery integration.Deployment
Unlike Spark, the BigQuery integration requires a one-time deployment step to set up Cloud Functions and remote function definitions in your GCP project.Prerequisites
- Google Cloud Project with billing enabled
- Parallel API Key from Parallel
-
Google Cloud SDK installed and authenticated:
Deploy with CLI
- Secret in Secret Manager for your API key
- Cloud Function (Gen2) that handles enrichment requests
- BigQuery Connection for remote function calls
- BigQuery Dataset (
parallel_functions) - Remote functions:
parallel_enrich()andparallel_enrich_company()
For manual deployment options, troubleshooting, and cleanup instructions, see the complete BigQuery setup guide.
Basic Usage
Once deployed, useparallel_enrich() in any BigQuery SQL query:
Function Parameters
| Parameter | Type | Description |
|---|---|---|
input_data | JSON | JSON object with key-value pairs of input data for enrichment |
output_columns | JSON | JSON array of descriptions for columns you want to enrich |
Parsing Results
The function returns JSON strings. Field names are converted to snake_case (e.g., “CEO name” →ceo_name).
Use JSON_EXTRACT_SCALAR() to extract individual fields:
Company Convenience Function
For common company enrichment use cases:Processor Selection
Choose a processor based on your speed vs thoroughness requirements. See Choose a Processor for detailed guidance and Pricing for cost information. To use a different processor, create a custom remote function with the desired processor in theuser_defined_context:
Best Practices
Batch sizing
Batch sizing
Process data in batches to manage costs and avoid timeouts:
Error handling
Error handling
Failed enrichments return JSON with an Filter these in your downstream processing.
error field:Cost management
Cost management
- Use
lite-fastfor high-volume, basic enrichments - Test with small batches before processing large tables
- Store results to avoid re-enriching the same data