This guide covers Triple Whale's Sync to Data Warehouse feature, which lets you automate data transfers into BigQuery, Snowflake, or AWS S3 at custom intervals using workflows. It includes setup instructions, architecture details, troubleshooting tips, and advanced usage guidance.
Supported Warehouses
- BigQuery: Learn how to set up BigQuery
- Snowflake: Learn how to set up Snowflake
- AWS S3: Learn how to set up AWS S3
Key Concepts
- Data Structure: Transfer data as tables based on user-defined SQL queries.
- Scheduled Syncs: Schedule syncs using custom intervals within Triple Whale workflows.
- Security and Permissions: Control data access and permissions for secure transfers.
Architecture

Data Warehouse Sync Architecture Diagram
Security Considerations
- Password Handling: For Snowflake, passwords are used directly without specialized security tokens, so you should create dedicated accounts with limited access to use for syncing data. (For BigQuery, users only need to grant write access to a specific dataset, so password security isn't a concern.)
- Data Protection: Users should create isolated datasets or tables specifically for sync operations to avoid compromising broader data access.
Data Structure and Sync Behavior
Table Creation and Schema
- Automatic Table Creation: Tables are created automatically based on the schema defined in the SQL query executed within the Triple Whale platform.
- Data Types: To ensure compatibility and future-proofing, data types are standardized. For instance, Triple Whale low-cardinality columns are represented as strings for flexibility.
- Schema Adjustments: The output table schema is not editable. If a schema change is needed, a new table must be created.
Data Freshness and Historical Data
- Custom Scheduling: Users can customize how often syncs occur through Triple Whale workflows (e.g., hourly, daily).
- Appending Data: Each sync adds new data rows; existing data is not replaced. Users should manage deduplication on their end if needed.
- Historical Sync: Users can run custom SQL queries to extract historical data in batches (e.g., month-by-month) for backfilling.
Common Errors and Troubleshooting
- Connection Issues: If the setup fails, verify that the provided credentials, project ID, and dataset ID are correct. Restart the process if errors persist.
- Permission Errors: Ensure that the service account or user account has been granted the necessary permissions for writing data.
- Table Name Conflicts: Verify that the new table name is unique to prevent overwrite or duplication errors.
Best Practices
- Managing Schema Changes: If a schema change is necessary, create a new table with the updated schema.
- Performance Optimization: For optimal performance, avoid syncing overly large data sets at frequent intervals. Break down data into manageable chunks if possible.
- Testing Connections: Run a test sync with a small query to confirm that the connection works before syncing larger datasets.
- Permission Management: To maintain data security, manually restrict data access through allow-listing or by creating service accounts with user-specific permissions.
- Data Segmentation: Sync data in manageable batches to optimize performance and avoid timeouts.
FAQs
How often does the sync run?
Users can configure the sync frequency within the Triple Whale Workflow Scheduler.
Is data deduplication handled?
No, users need to implement deduplication strategies within their data warehouse.
What if schema changes are required?
Create a new table and update your workflow to use the new schema.
How secure is the data during the sync process?
Data security is maintained through user-defined permissions and encrypted connections. Users should manage data access through restricted accounts or specific data sets.
What happens if a sync fails?
While detailed client-facing error logs are not yet available, users should check connection settings and workflow configurations. If problems persist, contact Triple Whale support.
Can I include specific fields or exclude certain data?
You can customize the SQL query to control the specific data structure and content that is synced.