Aggregate Functions
Functions that compute a single result from multiple rows. In streaming SQL, aggregations work with time windows. Examples use the sample data.
COUNT
Count rows or specific values.
Syntax:
COUNT(*)- Count all rowsCOUNT(expression)- Count non-null valuesCOUNT(DISTINCT expression)- Count unique values
Example:
SELECT
TUMBLE(interval '1 minute') as window,
COUNT(*) as total_orders,
COUNT(customer_id) as orders_with_customer,
COUNT(DISTINCT customer_id) as unique_customers
FROM orders
GROUP BY window| window_start | window_end | total_orders | orders_with_customer | unique_customers |
|---|---|---|---|---|
| 10:00:00 | 10:01:00 | 2 | 2 | 2 |
| 10:01:00 | 10:02:00 | 2 | 2 | 2 |
| 10:02:00 | 10:03:00 | 2 | 2 | 2 |
| 10:03:00 | 10:04:00 | 1 | 1 | 1 |
| 10:04:00 | 10:05:00 | 1 | 1 | 1 |
SUM
Add up numeric values.
Syntax:
SUM(expression)- Sum of all non-null values
Example:
SELECT
region,
TUMBLE(interval '5 minutes') as window,
SUM(amount) as total_revenue,
SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) as completed_revenue
FROM orders
GROUP BY region, window| region | window_start | window_end | total_revenue | completed_revenue |
|---|---|---|---|---|
| us-east | 10:00:00 | 10:05:00 | 564.99 | 364.99 |
| us-west | 10:00:00 | 10:05:00 | 375.50 | 375.50 |
| eu-west | 10:00:00 | 10:05:00 | 500.00 | 0.00 |
AVG
Calculate the arithmetic mean.
Syntax:
AVG(expression)- Average of non-null values
Example:
SELECT
status,
TUMBLE(interval '5 minutes') as window,
AVG(amount) as avg_order_amount,
COUNT(*) as order_count
FROM orders
GROUP BY status, window| status | window_start | window_end | avg_order_amount | order_count |
|---|---|---|---|---|
| completed | 10:00:00 | 10:05:00 | 148.10 | 5 |
| pending | 10:00:00 | 10:05:00 | 325.00 | 2 |
| cancelled | 10:00:00 | 10:05:00 | 50.00 | 1 |
MEAN
Alias for AVG.
Syntax:
MEAN(expression)- Same as AVG
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
MEAN(value) as mean_value
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | mean_value |
|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 54.70 |
MEDIAN
Find the middle value.
Syntax:
MEDIAN(expression)- Middle value (50th percentile)
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
MEDIAN(value) as median_value,
AVG(value) as avg_value
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | median_value | avg_value |
|---|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 52.15 | 54.70 |
MIN
Find the minimum value.
Syntax:
MIN(expression)- Smallest non-null value
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
MIN(value) as min_value,
MAX(value) as max_value
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | min_value | max_value |
|---|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 38.7 | 71.2 |
MAX
Find the maximum value.
Syntax:
MAX(expression)- Largest non-null value
Example:
SELECT
region,
TUMBLE(interval '5 minutes') as window,
MAX(amount) as largest_order,
MIN(amount) as smallest_order
FROM orders
GROUP BY region, window| region | window_start | window_end | largest_order | smallest_order |
|---|---|---|---|---|
| us-east | 10:00:00 | 10:05:00 | 200.00 | 89.99 |
| us-west | 10:00:00 | 10:05:00 | 300.00 | 75.50 |
| eu-west | 10:00:00 | 10:05:00 | 450.00 | 50.00 |
BOOL_AND
Returns true if ALL values are true.
Syntax:
BOOL_AND(expression)- Logical AND across all rows
Example:
SELECT
service,
TUMBLE(interval '5 minutes') as window,
BOOL_AND(level != 'ERROR') as all_healthy,
COUNT(*) as log_count
FROM logs
GROUP BY service, window| service | window_start | window_end | all_healthy | log_count |
|---|---|---|---|---|
| auth | 10:00:00 | 10:05:00 | true | 1 |
| payment | 10:00:00 | 10:05:00 | false | 1 |
| api | 10:00:00 | 10:05:00 | true | 1 |
| orders | 10:00:00 | 10:05:00 | true | 1 |
| inventory | 10:00:00 | 10:05:00 | false | 1 |
| cache | 10:00:00 | 10:05:00 | true | 1 |
BOOL_OR
Returns true if ANY value is true.
Syntax:
BOOL_OR(expression)- Logical OR across all rows
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
BOOL_OR(level = 'ERROR') as has_errors,
BOOL_OR(level = 'WARN') as has_warnings
FROM logs
GROUP BY window| window_start | window_end | has_errors | has_warnings |
|---|---|---|---|
| 10:00:00 | 10:05:00 | true | true |
BIT_AND
Bitwise AND of all values.
Syntax:
BIT_AND(expression)- Bitwise AND across all integer values
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
BIT_AND(CAST(value as INTEGER)) as bit_and_result
FROM metrics
WHERE name = 'requests'
GROUP BY window| window_start | window_end | bit_and_result |
|---|---|---|
| 10:00:00 | 10:05:00 | 1250 |
BIT_OR
Bitwise OR of all values.
Syntax:
BIT_OR(expression)- Bitwise OR across all integer values
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
BIT_OR(CAST(value as INTEGER)) as bit_or_result
FROM metrics
WHERE name IN ('cpu_usage', 'memory_mb')
GROUP BY window| window_start | window_end | bit_or_result |
|---|---|---|
| 10:00:00 | 10:05:00 | 2175 |
BIT_XOR
Bitwise XOR of all values.
Syntax:
BIT_XOR(expression)- Bitwise XOR across all integer values
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
BIT_XOR(CAST(value as INTEGER)) as bit_xor_result
FROM metrics
WHERE name = 'memory_mb'
GROUP BY window| window_start | window_end | bit_xor_result |
|---|---|---|
| 10:00:00 | 10:05:00 | 108 |
ARRAY_AGG
Collect values into an array.
Syntax:
ARRAY_AGG(expression)- Collect all valuesARRAY_AGG(expression ORDER BY ...)- Collect in orderARRAY_AGG(DISTINCT expression)- Collect unique values
Example:
SELECT
customer_id,
TUMBLE(interval '1 hour') as window,
ARRAY_AGG(order_id ORDER BY timestamp) as order_ids,
ARRAY_AGG(amount ORDER BY timestamp) as amounts
FROM orders
GROUP BY customer_id, window| customer_id | window_start | window_end | order_ids | amounts |
|---|---|---|---|---|
| c1 | 10:00:00 | 11:00:00 | [ord_1, ord_3, ord_7] | [150.00, 200.00, 89.99] |
| c2 | 10:00:00 | 11:00:00 | [ord_2, ord_5] | [75.50, 300.00] |
| c3 | 10:00:00 | 11:00:00 | [ord_4] | [50.00] |
| c4 | 10:00:00 | 11:00:00 | [ord_6] | [125.00] |
| c5 | 10:00:00 | 11:00:00 | [ord_8] | [450.00] |
FIRST_VALUE
Get the first value in a group.
Syntax:
FIRST_VALUE(expression ORDER BY ...)- First value after ordering
Example:
SELECT
user_id,
SESSION(interval '30 minutes') as session,
FIRST_VALUE(page ORDER BY timestamp) as landing_page,
FIRST_VALUE(event_type ORDER BY timestamp) as first_action
FROM events
GROUP BY user_id, session| user_id | session_start | session_end | landing_page | first_action |
|---|---|---|---|---|
| u1 | 10:00:05 | 10:00:45 | /home | page_view |
| u2 | 10:00:18 | 10:01:15 | /home | page_view |
| u3 | 10:01:00 | 10:01:00 | /home | page_view |
LAST_VALUE
Get the last value in a group.
Syntax:
LAST_VALUE(expression ORDER BY ...)- Last value after ordering
Example:
SELECT
user_id,
SESSION(interval '30 minutes') as session,
LAST_VALUE(page ORDER BY timestamp) as exit_page,
LAST_VALUE(event_type ORDER BY timestamp) as last_action
FROM events
GROUP BY user_id, session| user_id | session_start | session_end | exit_page | last_action |
|---|---|---|---|---|
| u1 | 10:00:05 | 10:00:45 | /checkout | purchase |
| u2 | 10:00:18 | 10:01:15 | /products | page_view |
| u3 | 10:01:00 | 10:01:00 | /home | page_view |
VAR / VAR_SAMP
Sample variance.
Syntax:
VAR(expression)- Sample varianceVAR_SAMP(expression)- Same as VAR
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
VAR_SAMP(value) as sample_variance,
COUNT(*) as sample_count
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | sample_variance | sample_count |
|---|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 165.13 | 5 |
VAR_POP
Population variance.
Syntax:
VAR_POP(expression)- Population variance (divides by N)
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
VAR_POP(value) as pop_variance,
VAR_SAMP(value) as sample_variance
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | pop_variance | sample_variance |
|---|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 132.10 | 165.13 |
STDDEV / STDDEV_SAMP
Sample standard deviation.
Syntax:
STDDEV(expression)- Sample standard deviationSTDDEV_SAMP(expression)- Same as STDDEV
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
AVG(value) as mean,
STDDEV(value) as stddev
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | mean | stddev |
|---|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 54.70 | 12.85 |
STDDEV_POP
Population standard deviation.
Syntax:
STDDEV_POP(expression)- Population standard deviation
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
STDDEV_POP(value) as pop_stddev,
STDDEV_SAMP(value) as sample_stddev
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | pop_stddev | sample_stddev |
|---|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 11.49 | 12.85 |
CORR
Correlation coefficient between two variables.
Syntax:
CORR(y, x)- Pearson correlation coefficient (-1 to 1)
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
CORR(m1.value, m2.value) as cpu_memory_correlation
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | cpu_memory_correlation |
|---|---|---|
| 10:00:00 | 10:05:00 | 0.87 |
COVAR / COVAR_SAMP
Sample covariance.
Syntax:
COVAR(y, x)- Sample covarianceCOVAR_SAMP(y, x)- Same as COVAR
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
COVAR_SAMP(m1.value, m2.value) as covariance
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | covariance |
|---|---|---|
| 10:00:00 | 10:05:00 | 1247.50 |
COVAR_POP
Population covariance.
Syntax:
COVAR_POP(y, x)- Population covariance
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
COVAR_POP(m1.value, m2.value) as pop_covariance,
COVAR_SAMP(m1.value, m2.value) as sample_covariance
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | pop_covariance | sample_covariance |
|---|---|---|---|
| 10:00:00 | 10:05:00 | 623.75 | 1247.50 |
REGR_SLOPE
Slope of linear regression line.
Syntax:
REGR_SLOPE(y, x)- Slope coefficient (change in Y per unit X)
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
REGR_SLOPE(m1.value, m2.value) as slope
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | slope |
|---|---|---|
| 10:00:00 | 10:05:00 | 0.21 |
REGR_INTERCEPT
Y-intercept of linear regression line.
Syntax:
REGR_INTERCEPT(y, x)- Intercept (Y when X=0)
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
REGR_INTERCEPT(m1.value, m2.value) as intercept,
REGR_SLOPE(m1.value, m2.value) as slope
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | intercept | slope |
|---|---|---|---|
| 10:00:00 | 10:05:00 | -387.42 | 0.21 |
REGR_R2
R-squared (coefficient of determination).
Syntax:
REGR_R2(y, x)- How well the regression fits (0 to 1)
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
REGR_R2(m1.value, m2.value) as r_squared
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | r_squared |
|---|---|---|
| 10:00:00 | 10:05:00 | 0.76 |
REGR_COUNT
Count of non-null pairs.
Syntax:
REGR_COUNT(y, x)- Number of rows with non-null X and Y
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
REGR_COUNT(m1.value, m2.value) as pair_count
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | pair_count |
|---|---|---|
| 10:00:00 | 10:05:00 | 2 |
REGR_AVGX / REGR_AVGY
Average of X or Y values in regression.
Syntax:
REGR_AVGX(y, x)- Average of X valuesREGR_AVGY(y, x)- Average of Y values
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
REGR_AVGX(m1.value, m2.value) as avg_memory,
REGR_AVGY(m1.value, m2.value) as avg_cpu
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | avg_memory | avg_cpu |
|---|---|---|---|
| 10:00:00 | 10:05:00 | 2102.00 | 53.90 |
REGR_SXX / REGR_SYY / REGR_SXY
Sum of squares for regression.
Syntax:
REGR_SXX(y, x)- Sum of squares of X deviationsREGR_SYY(y, x)- Sum of squares of Y deviationsREGR_SXY(y, x)- Sum of products of deviations
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
REGR_SXX(m1.value, m2.value) as sxx,
REGR_SYY(m1.value, m2.value) as syy,
REGR_SXY(m1.value, m2.value) as sxy
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
AND m1.tags = m2.tags
GROUP BY window| window_start | window_end | sxx | syy | sxy |
|---|---|---|---|---|
| 10:00:00 | 10:05:00 | 5832.00 | 141.12 | 1247.50 |
APPROX_DISTINCT
Approximate count of distinct values using HyperLogLog.
Syntax:
APPROX_DISTINCT(expression)- Fast approximate unique count
Example:
SELECT
TUMBLE(interval '1 minute') as window,
COUNT(DISTINCT user_id) as exact_unique,
APPROX_DISTINCT(user_id) as approx_unique
FROM events
GROUP BY window| window_start | window_end | exact_unique | approx_unique |
|---|---|---|---|
| 10:00:00 | 10:01:00 | 3 | 3 |
| 10:01:00 | 10:02:00 | 2 | 2 |
Much faster than COUNT(DISTINCT) for high-cardinality columns.
APPROX_MEDIAN
Approximate median using t-digest.
Syntax:
APPROX_MEDIAN(expression)- Fast 50th percentile
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
APPROX_MEDIAN(amount) as approx_median,
MEDIAN(amount) as exact_median
FROM orders
GROUP BY window| window_start | window_end | approx_median | exact_median |
|---|---|---|---|
| 10:00:00 | 10:05:00 | 137.50 | 137.50 |
APPROX_PERCENTILE_CONT
Approximate percentile using t-digest.
Syntax:
APPROX_PERCENTILE_CONT(expression, percentile)- Percentile (0 to 1)APPROX_PERCENTILE_CONT(expression, percentile, centroids)- With precision
Example:
SELECT
name,
TUMBLE(interval '5 minutes') as window,
APPROX_PERCENTILE_CONT(value, 0.50) as p50,
APPROX_PERCENTILE_CONT(value, 0.90) as p90,
APPROX_PERCENTILE_CONT(value, 0.99) as p99
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window| name | window_start | window_end | p50 | p90 | p99 |
|---|---|---|---|---|---|
| cpu_usage | 10:00:00 | 10:05:00 | 52.15 | 68.74 | 70.95 |
APPROX_PERCENTILE_CONT_WITH_WEIGHT
Weighted approximate percentile.
Syntax:
APPROX_PERCENTILE_CONT_WITH_WEIGHT(value, weight, percentile)- Weighted percentile
Example:
SELECT
TUMBLE(interval '5 minutes') as window,
APPROX_PERCENTILE_CONT_WITH_WEIGHT(
value,
CAST(1 as DOUBLE),
0.95
) as weighted_p95
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY window| window_start | window_end | weighted_p95 |
|---|---|---|
| 10:00:00 | 10:05:00 | 69.52 |
Useful when aggregating pre-aggregated data with counts.
Patterns
Rate Calculations
SELECT
TUMBLE(interval '1 minute') as window,
COUNT(*) as total,
SUM(CASE WHEN level = 'ERROR' THEN 1 ELSE 0 END) as errors,
SUM(CASE WHEN level = 'ERROR' THEN 1 ELSE 0 END)::DOUBLE / COUNT(*) as error_rate
FROM logs
GROUP BY window| window_start | window_end | total | errors | error_rate |
|---|---|---|---|---|
| 10:00:00 | 10:01:00 | 1 | 0 | 0.00 |
| 10:01:00 | 10:02:00 | 2 | 1 | 0.50 |
| 10:02:00 | 10:03:00 | 2 | 1 | 0.50 |
| 10:03:00 | 10:04:00 | 1 | 0 | 0.00 |
Conditional Aggregates
SELECT
TUMBLE(interval '5 minutes') as window,
SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) as completed_revenue,
SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) as pending_revenue,
COUNT(CASE WHEN status = 'cancelled' THEN 1 END) as cancelled_count
FROM orders
GROUP BY window| window_start | window_end | completed_revenue | pending_revenue | cancelled_count |
|---|---|---|---|---|
| 10:00:00 | 10:05:00 | 740.49 | 650.00 | 1 |
Null Handling
Most aggregates ignore NULL values:
SELECT
COUNT(*) as count_all,
COUNT(region) as count_non_null,
SUM(amount) as sum_amount,
AVG(amount) as avg_amount
FROM orders
GROUP BY TUMBLE(interval '1 hour')| count_all | count_non_null | sum_amount | avg_amount |
|---|---|---|---|
| 8 | 8 | 1440.49 | 180.06 |