SQL Example Library Overview and Best Practices
The SQL Example Library contains working examples of SQL queries built for the Triple Whale data model. Each query solves a specific analytical problem and follows best practices for performance, readability, and correctness. The goal is to help advanced SQL users build dashboards, reports, and agents more efficiently using trusted patterns.
Each example includes a complete working query, a breakdown of the tables and metrics involved, and customization tips for adapting the pattern to your specific use case.
All examples are written for the Triple Whale's ClickHouse-driven SQL Editor.
Working with ClickHouse
If you're used to BigQuery-flavored SQL, note that ClickHouse has different syntax rules, especially around case sensitivity, aliases, array handling, and time functions.
For a full comparison—including function mappings and common migration pitfalls—refer to our dedicated guide: Moving from BigQuery to ClickHouse syntax in the SQL Editor
Use Moby to Start Your Query
Before writing SQL from scratch, try asking Moby, Triple Whale’s AI agent, to generate a starting point. Moby understands the Triple Whale data model and can suggest the right tables, filters, and metrics based on your plain-English prompt.
You’ll save time, avoid common mistakes, and often discover better query structures than starting manually.
Example prompts:
- “ROAS by channel last week”
- “How many first-time customers bought each product?”
- “Conversion rate over time for Klaviyo flows”
Once Moby gives you a draft, use the best practices below—and your SQL knowledge—to refine and adapt the query to your exact needs.
Best Practices for Writing SQL in Triple Whale
These examples are designed to help you avoid common pitfalls and write queries that are fast, stable, and easy to maintain.
Use pixel_joined_tvf
for Ad & Attribution Data
pixel_joined_tvf
for Ad & Attribution DataThis view combines ad spend, sessions, orders, revenue, attribution models, and new customer tags. It is optimized and more reliable than joining raw ads, sessions, and orders yourself.
-- ✅ Use this view directly
SELECT
channel,
SUM(spend) AS spend,
SUM(order_revenue) AS revenue
FROM pixel_joined_tvf
WHERE event_date BETWEEN @start_date AND @end_date
-- ❌ Avoid pulling separately from ads_table, orders_table, and sessions_table and joining them manually
Split complex logic into CTEs for readability
Avoid excessive nesting and complexity by splitting logic into clear CTEs and avoid multi-table joins in a single step.
-- ✅ Easier to debug
WITH recent_orders AS (
SELECT customer_id, order_id
FROM orders_table
WHERE event_date > '2024-01-01'
)
SELECT customer_id, COUNT(DISTINCT order_id)
FROM recent_orders
-- ❌ Hard to maintain
SELECT customer_id, COUNT(DISTINCT IF(order_date > '2024-01-01', order_id, NULL))
FROM orders_table o
JOIN ads_table a ON o.ad_id = a.ad_id
JOIN sessions s ON s.session_id = o.session_id
Use canonical aliases—but avoid ambiguous references
Triple Whale dashboards and agents rely on specific column aliases (like spend
, order_revenue
, clicks
) to display the correct metric names, formatting, and tooltips. If you don’t use the canonical alias in your final SELECT
(e.g. AS spend
), the UI may fall back to generic labels or omit important context. See data dictionary for full canonical table schemas.
But in ClickHouse, aliases are in-scope throughout the query—including the WHERE
clause—so using the same name for both the raw column and the aggregation can cause errors or incorrect results.
Best practices:
- In your final
SELECT
, use canonical aliases so the UI renders metric context correctly - In filters or intermediate logic, prefix raw columns with a table alias (e.g.
a.spend
) - Avoid referencing a reused alias in the
WHERE
clause; if needed, use an alternate alias (e.g.total_spend
) during earlier steps
-- ✅ Good: Canonical alias + disambiguated filter
SELECT SUM(spend) AS spend
FROM ads_table a
WHERE a.spend > 100
-- ❌ Ambiguous filter may break or give wrong results
SELECT SUM(spend) AS spend
FROM ads_table
WHERE spend > 100
-- ❌ Non-canonical alias breaks UI metadata
SELECT SUM(spend) AS total_spend
FROM ads_table a
WHERE a.spend > 100
Filter early and pre-aggregate before joining
Joining or grouping directly on raw event-level data can lead to slow, error-prone queries due to row explosion and duplicated values. Instead, filter your data early and pre-aggregate at the right grain before applying joins or calculations.
This approach improves query performance, avoids double-counting, and makes business logic easier to manage.
-- ✅ Good: Filter and summarize in CTEs before joining and final calculations
WITH filtered_orders AS (
SELECT
ad_id,
toStartOfMonth(event_date) AS month,
SUM(order_revenue) AS revenue
FROM orders_table
WHERE
event_date BETWEEN @start_date AND @end_date
GROUP BY ad_id, month
),
filtered_ads AS (
SELECT
ad_id,
campaign_name,
channel
FROM ads_table
WHERE
channel IN ('facebook-ads', 'google-ads', 'bing')
)
SELECT
o.month,
a.channel,
SUM(o.revenue) AS total_revenue
FROM filtered_orders o
JOIN filtered_ads a ON o.ad_id = a.ad_id
GROUP BY o.month, a.channel
ORDER BY o.month, a.channel
-- ❌ Bad: Join and group on raw event-level data
SELECT
toStartOfMonth(o.event_date) AS month,
a.channel,
SUM(o.order_revenue) AS total_revenue
FROM orders_table o
JOIN ads_table a ON o.ad_id = a.ad_id
WHERE
o.event_date BETWEEN @start_date AND @end_date
AND a.channel IN ('facebook-ads', 'google-ads', 'bing')
GROUP BY month, a.channel
ORDER BY month, a.channel
Handle edge cases explicitly
Common issues:
- Customers with only one order
- Orders with empty
products_info
- Campaigns with zero impressions
Use COALESCE
, IF
, and HAVING
clauses to handle these safely.
-- ✅ Defensive logic
SELECT campaign_id,
COALESCE(SUM(spend), 0) AS total_spend
FROM ads_table
GROUP BY campaign_id
HAVING total_spend > 0
Guard rate calculations with nullIf
nullIf
ClickHouse will throw a division-by-zero error if not handled manually. Use nullIf() to protect the denominator.
-- ✅ Safe division
SELECT SUM(revenue) / nullIf(SUM(spend), 0) AS roas
-- ❌ This will fail if spend is zero
SELECT SUM(revenue) / SUM(spend) AS roas
Use @start_date
and @end_date
parameters
@start_date
and @end_date
parametersThese parameters are compatible with dashboard filters and agent queries.
-- ✅ Use dynamic date range
WHERE event_date BETWEEN @start_date AND @end_date
-- ❌ Hardcoded dates are less flexible
WHERE event_date BETWEEN '2024-01-01' AND '2024-01-31'
Be precise with product journey definitions
Use different techniques depending on whether you're analyzing co-purchases (products in the same order) or sequential purchases (products bought in later orders).
Window functions like any()
and ROW_NUMBER()
are helpful for tracking customer journeys over time—such as identifying what customers buy next or ranking user events within a session.
-- ✅ Sequential (Next Order)
SELECT customer_id, any(products_info) OVER (
PARTITION BY customer_id
ORDER BY created_at ASC
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING
) AS next_order_products
-- ✅ Co-Purchase (Same Order)
SELECT customer_id, product.product_id
FROM orders_table
ARRAY JOIN products_info AS product
Use array functions for product-level logic
Triple Whale product data is stored in arrays. Avoid arrayJoin
unless necessary—use array-safe functions like arraySum
, arrayMap
, or arrayReduce
to avoid inflating row counts.
-- ✅ Proper array usage
SELECT arrayMap(x -> x.product_id, products_info) AS product_ids
-- ❌ Avoid flattening arrays without context
-- Leads to duplication or inflated aggregates
Use GROUPING SETS
when you need rollups
GROUPING SETS
when you need rollupsGROUPING SETS
can reduce query duplication, but don't use it where simpler logic works.
Only use GROUPING SETS
if you want both granular breakdowns and a total row; avoid using it when only a single breakdown is needed.
-- ✅ Useful when you want both per-channel and total
GROUP BY GROUPING SETS ((channel), ())
-- ❌ Avoid when only one breakdown level is needed
GROUP BY channel
Updated 1 day ago