SQL Example Library Overview and Best Practices

The SQL Example Library contains working examples of SQL queries built for the Triple Whale data model. Each query solves a specific analytical problem and follows best practices for performance, readability, and correctness. The goal is to help advanced SQL users build dashboards, reports, and agents more efficiently using trusted patterns.

Each example includes a complete working query, a breakdown of the tables and metrics involved, and customization tips for adapting the pattern to your specific use case.

All examples are written for the Triple Whale's ClickHouse-driven SQL Editor.

Working with ClickHouse

If you're used to BigQuery-flavored SQL, note that ClickHouse has different syntax rules, especially around case sensitivity, aliases, array handling, and time functions.

For a full comparison—including function mappings and common migration pitfalls—refer to our dedicated guide: Moving from BigQuery to ClickHouse syntax in the SQL Editor

Use Moby to Start Your Query

Before writing SQL from scratch, try asking Moby, Triple Whale’s AI agent, to generate a starting point. Moby understands the Triple Whale data model and can suggest the right tables, filters, and metrics based on your plain-English prompt.

You’ll save time, avoid common mistakes, and often discover better query structures than starting manually.

Example prompts:

  • “ROAS by channel last week”
  • “How many first-time customers bought each product?”
  • “Conversion rate over time for Klaviyo flows”

Once Moby gives you a draft, use the best practices below—and your SQL knowledge—to refine and adapt the query to your exact needs.

Best Practices for Writing SQL in Triple Whale

These examples are designed to help you avoid common pitfalls and write queries that are fast, stable, and easy to maintain.

Use pixel_joined_tvf for Ad & Attribution Data

This view combines ad spend, sessions, orders, revenue, attribution models, and new customer tags. It is optimized and more reliable than joining raw ads, sessions, and orders yourself.

-- ✅ Use this view directly
SELECT
  channel,
  SUM(spend) AS spend,
  SUM(order_revenue) AS revenue
FROM pixel_joined_tvf
WHERE event_date BETWEEN @start_date AND @end_date

-- ❌ Avoid pulling separately from ads_table, orders_table, and sessions_table and joining them manually

Split complex logic into CTEs for readability

Avoid excessive nesting and complexity by splitting logic into clear CTEs and avoid multi-table joins in a single step.

-- ✅ Easier to debug
WITH recent_orders AS (
  SELECT customer_id, order_id
  FROM orders_table
  WHERE event_date > '2024-01-01'
)
SELECT customer_id, COUNT(DISTINCT order_id)
FROM recent_orders

-- ❌ Hard to maintain
SELECT customer_id, COUNT(DISTINCT IF(order_date > '2024-01-01', order_id, NULL))
FROM orders_table o
JOIN ads_table a ON o.ad_id = a.ad_id
JOIN sessions s ON s.session_id = o.session_id

Use canonical aliases—but avoid ambiguous references

Triple Whale dashboards and agents rely on specific column aliases (like spend, order_revenue, clicks) to display the correct metric names, formatting, and tooltips. If you don’t use the canonical alias in your final SELECT (e.g. AS spend), the UI may fall back to generic labels or omit important context. See data dictionary for full canonical table schemas.

But in ClickHouse, aliases are in-scope throughout the query—including the WHERE clause—so using the same name for both the raw column and the aggregation can cause errors or incorrect results.

Best practices:

  • In your final SELECT, use canonical aliases so the UI renders metric context correctly
  • In filters or intermediate logic, prefix raw columns with a table alias (e.g. a.spend)
  • Avoid referencing a reused alias in the WHERE clause; if needed, use an alternate alias (e.g. total_spend) during earlier steps
-- ✅ Good: Canonical alias + disambiguated filter
SELECT SUM(spend) AS spend
FROM ads_table a
WHERE a.spend > 100

-- ❌ Ambiguous filter may break or give wrong results
SELECT SUM(spend) AS spend
FROM ads_table
WHERE spend > 100

-- ❌ Non-canonical alias breaks UI metadata
SELECT SUM(spend) AS total_spend
FROM ads_table a
WHERE a.spend > 100

Filter early and pre-aggregate before joining

Joining or grouping directly on raw event-level data can lead to slow, error-prone queries due to row explosion and duplicated values. Instead, filter your data early and pre-aggregate at the right grain before applying joins or calculations.

This approach improves query performance, avoids double-counting, and makes business logic easier to manage.

-- ✅ Good: Filter and summarize in CTEs before joining and final calculations
WITH filtered_orders AS (
  SELECT
    ad_id,
    toStartOfMonth(event_date) AS month,
    SUM(order_revenue) AS revenue
  FROM orders_table
  WHERE
    event_date BETWEEN @start_date AND @end_date
  GROUP BY ad_id, month
),
filtered_ads AS (
  SELECT
    ad_id,
    campaign_name,
    channel
  FROM ads_table
  WHERE
    channel IN ('facebook-ads', 'google-ads', 'bing')
)
SELECT
  o.month,
  a.channel,
  SUM(o.revenue) AS total_revenue
FROM filtered_orders o
JOIN filtered_ads a ON o.ad_id = a.ad_id
GROUP BY o.month, a.channel
ORDER BY o.month, a.channel

-- ❌ Bad: Join and group on raw event-level data
SELECT
  toStartOfMonth(o.event_date) AS month,
  a.channel,
  SUM(o.order_revenue) AS total_revenue
FROM orders_table o
JOIN ads_table a ON o.ad_id = a.ad_id
WHERE
  o.event_date BETWEEN @start_date AND @end_date
  AND a.channel IN ('facebook-ads', 'google-ads', 'bing')
GROUP BY month, a.channel
ORDER BY month, a.channel

Handle edge cases explicitly

Common issues:

  • Customers with only one order
  • Orders with empty products_info
  • Campaigns with zero impressions

Use COALESCE, IF, and HAVING clauses to handle these safely.

-- ✅ Defensive logic
SELECT campaign_id,
       COALESCE(SUM(spend), 0) AS total_spend
FROM ads_table
GROUP BY campaign_id
HAVING total_spend > 0

Guard rate calculations with nullIf

ClickHouse will throw a division-by-zero error if not handled manually. Use nullIf() to protect the denominator.

-- ✅ Safe division
SELECT SUM(revenue) / nullIf(SUM(spend), 0) AS roas

-- ❌ This will fail if spend is zero
SELECT SUM(revenue) / SUM(spend) AS roas

Use @start_date and @end_date parameters

These parameters are compatible with dashboard filters and agent queries.

-- ✅ Use dynamic date range
WHERE event_date BETWEEN @start_date AND @end_date

-- ❌ Hardcoded dates are less flexible
WHERE event_date BETWEEN '2024-01-01' AND '2024-01-31'

Be precise with product journey definitions

Use different techniques depending on whether you're analyzing co-purchases (products in the same order) or sequential purchases (products bought in later orders).

Window functions like any() and ROW_NUMBER() are helpful for tracking customer journeys over time—such as identifying what customers buy next or ranking user events within a session.

-- ✅ Sequential (Next Order)
SELECT customer_id, any(products_info) OVER (
  PARTITION BY customer_id
  ORDER BY created_at ASC
  ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING
) AS next_order_products

-- ✅ Co-Purchase (Same Order)
SELECT customer_id, product.product_id
FROM orders_table
ARRAY JOIN products_info AS product

Use array functions for product-level logic

Triple Whale product data is stored in arrays. Avoid arrayJoin unless necessary—use array-safe functions like arraySum, arrayMap, or arrayReduce to avoid inflating row counts.

-- ✅ Proper array usage
SELECT arrayMap(x -> x.product_id, products_info) AS product_ids

-- ❌ Avoid flattening arrays without context
-- Leads to duplication or inflated aggregates

Use GROUPING SETS when you need rollups

GROUPING SETS can reduce query duplication, but don't use it where simpler logic works.

Only use GROUPING SETS if you want both granular breakdowns and a total row; avoid using it when only a single breakdown is needed.

-- ✅ Useful when you want both per-channel and total
GROUP BY GROUPING SETS ((channel), ())

-- ❌ Avoid when only one breakdown level is needed
GROUP BY channel