Work in Progress: This page is under development. Use the feedback button on the bottom right to help us improve it.

Aggregate Functions

Functions that compute a single result from multiple rows. In streaming SQL, aggregations work with time windows. Examples use the sample data.


COUNT

Count rows or specific values.

Syntax:

  • COUNT(*) - Count all rows
  • COUNT(expression) - Count non-null values
  • COUNT(DISTINCT expression) - Count unique values

Example:

SELECT
  TUMBLE(interval '1 minute') as window,
  COUNT(*) as total_orders,
  COUNT(customer_id) as orders_with_customer,
  COUNT(DISTINCT customer_id) as unique_customers
FROM orders
GROUP BY window
window_startwindow_endtotal_ordersorders_with_customerunique_customers
10:00:0010:01:00222
10:01:0010:02:00222
10:02:0010:03:00222
10:03:0010:04:00111
10:04:0010:05:00111

SUM

Add up numeric values.

Syntax:

  • SUM(expression) - Sum of all non-null values

Example:

SELECT
  region,
  TUMBLE(interval '5 minutes') as window,
  SUM(amount) as total_revenue,
  SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) as completed_revenue
FROM orders
GROUP BY region, window
regionwindow_startwindow_endtotal_revenuecompleted_revenue
us-east10:00:0010:05:00564.99364.99
us-west10:00:0010:05:00375.50375.50
eu-west10:00:0010:05:00500.000.00

AVG

Calculate the arithmetic mean.

Syntax:

  • AVG(expression) - Average of non-null values

Example:

SELECT
  status,
  TUMBLE(interval '5 minutes') as window,
  AVG(amount) as avg_order_amount,
  COUNT(*) as order_count
FROM orders
GROUP BY status, window
statuswindow_startwindow_endavg_order_amountorder_count
completed10:00:0010:05:00148.105
pending10:00:0010:05:00325.002
cancelled10:00:0010:05:0050.001

MEAN

Alias for AVG.

Syntax:

  • MEAN(expression) - Same as AVG

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  MEAN(value) as mean_value
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endmean_value
cpu_usage10:00:0010:05:0054.70

MEDIAN

Find the middle value.

Syntax:

  • MEDIAN(expression) - Middle value (50th percentile)

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  MEDIAN(value) as median_value,
  AVG(value) as avg_value
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endmedian_valueavg_value
cpu_usage10:00:0010:05:0052.1554.70

MIN

Find the minimum value.

Syntax:

  • MIN(expression) - Smallest non-null value

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  MIN(value) as min_value,
  MAX(value) as max_value
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endmin_valuemax_value
cpu_usage10:00:0010:05:0038.771.2

MAX

Find the maximum value.

Syntax:

  • MAX(expression) - Largest non-null value

Example:

SELECT
  region,
  TUMBLE(interval '5 minutes') as window,
  MAX(amount) as largest_order,
  MIN(amount) as smallest_order
FROM orders
GROUP BY region, window
regionwindow_startwindow_endlargest_ordersmallest_order
us-east10:00:0010:05:00200.0089.99
us-west10:00:0010:05:00300.0075.50
eu-west10:00:0010:05:00450.0050.00

BOOL_AND

Returns true if ALL values are true.

Syntax:

  • BOOL_AND(expression) - Logical AND across all rows

Example:

SELECT
  service,
  TUMBLE(interval '5 minutes') as window,
  BOOL_AND(level != 'ERROR') as all_healthy,
  COUNT(*) as log_count
FROM logs
GROUP BY service, window
servicewindow_startwindow_endall_healthylog_count
auth10:00:0010:05:00true1
payment10:00:0010:05:00false1
api10:00:0010:05:00true1
orders10:00:0010:05:00true1
inventory10:00:0010:05:00false1
cache10:00:0010:05:00true1

BOOL_OR

Returns true if ANY value is true.

Syntax:

  • BOOL_OR(expression) - Logical OR across all rows

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  BOOL_OR(level = 'ERROR') as has_errors,
  BOOL_OR(level = 'WARN') as has_warnings
FROM logs
GROUP BY window
window_startwindow_endhas_errorshas_warnings
10:00:0010:05:00truetrue

BIT_AND

Bitwise AND of all values.

Syntax:

  • BIT_AND(expression) - Bitwise AND across all integer values

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  BIT_AND(CAST(value as INTEGER)) as bit_and_result
FROM metrics
WHERE name = 'requests'
GROUP BY window
window_startwindow_endbit_and_result
10:00:0010:05:001250

BIT_OR

Bitwise OR of all values.

Syntax:

  • BIT_OR(expression) - Bitwise OR across all integer values

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  BIT_OR(CAST(value as INTEGER)) as bit_or_result
FROM metrics
WHERE name IN ('cpu_usage', 'memory_mb')
GROUP BY window
window_startwindow_endbit_or_result
10:00:0010:05:002175

BIT_XOR

Bitwise XOR of all values.

Syntax:

  • BIT_XOR(expression) - Bitwise XOR across all integer values

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  BIT_XOR(CAST(value as INTEGER)) as bit_xor_result
FROM metrics
WHERE name = 'memory_mb'
GROUP BY window
window_startwindow_endbit_xor_result
10:00:0010:05:00108

ARRAY_AGG

Collect values into an array.

Syntax:

  • ARRAY_AGG(expression) - Collect all values
  • ARRAY_AGG(expression ORDER BY ...) - Collect in order
  • ARRAY_AGG(DISTINCT expression) - Collect unique values

Example:

SELECT
  customer_id,
  TUMBLE(interval '1 hour') as window,
  ARRAY_AGG(order_id ORDER BY timestamp) as order_ids,
  ARRAY_AGG(amount ORDER BY timestamp) as amounts
FROM orders
GROUP BY customer_id, window
customer_idwindow_startwindow_endorder_idsamounts
c110:00:0011:00:00[ord_1, ord_3, ord_7][150.00, 200.00, 89.99]
c210:00:0011:00:00[ord_2, ord_5][75.50, 300.00]
c310:00:0011:00:00[ord_4][50.00]
c410:00:0011:00:00[ord_6][125.00]
c510:00:0011:00:00[ord_8][450.00]

FIRST_VALUE

Get the first value in a group.

Syntax:

  • FIRST_VALUE(expression ORDER BY ...) - First value after ordering

Example:

SELECT
  user_id,
  SESSION(interval '30 minutes') as session,
  FIRST_VALUE(page ORDER BY timestamp) as landing_page,
  FIRST_VALUE(event_type ORDER BY timestamp) as first_action
FROM events
GROUP BY user_id, session
user_idsession_startsession_endlanding_pagefirst_action
u110:00:0510:00:45/homepage_view
u210:00:1810:01:15/homepage_view
u310:01:0010:01:00/homepage_view

LAST_VALUE

Get the last value in a group.

Syntax:

  • LAST_VALUE(expression ORDER BY ...) - Last value after ordering

Example:

SELECT
  user_id,
  SESSION(interval '30 minutes') as session,
  LAST_VALUE(page ORDER BY timestamp) as exit_page,
  LAST_VALUE(event_type ORDER BY timestamp) as last_action
FROM events
GROUP BY user_id, session
user_idsession_startsession_endexit_pagelast_action
u110:00:0510:00:45/checkoutpurchase
u210:00:1810:01:15/productspage_view
u310:01:0010:01:00/homepage_view

VAR / VAR_SAMP

Sample variance.

Syntax:

  • VAR(expression) - Sample variance
  • VAR_SAMP(expression) - Same as VAR

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  VAR_SAMP(value) as sample_variance,
  COUNT(*) as sample_count
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endsample_variancesample_count
cpu_usage10:00:0010:05:00165.135

VAR_POP

Population variance.

Syntax:

  • VAR_POP(expression) - Population variance (divides by N)

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  VAR_POP(value) as pop_variance,
  VAR_SAMP(value) as sample_variance
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endpop_variancesample_variance
cpu_usage10:00:0010:05:00132.10165.13

STDDEV / STDDEV_SAMP

Sample standard deviation.

Syntax:

  • STDDEV(expression) - Sample standard deviation
  • STDDEV_SAMP(expression) - Same as STDDEV

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  AVG(value) as mean,
  STDDEV(value) as stddev
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endmeanstddev
cpu_usage10:00:0010:05:0054.7012.85

STDDEV_POP

Population standard deviation.

Syntax:

  • STDDEV_POP(expression) - Population standard deviation

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  STDDEV_POP(value) as pop_stddev,
  STDDEV_SAMP(value) as sample_stddev
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endpop_stddevsample_stddev
cpu_usage10:00:0010:05:0011.4912.85

CORR

Correlation coefficient between two variables.

Syntax:

  • CORR(y, x) - Pearson correlation coefficient (-1 to 1)

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  CORR(m1.value, m2.value) as cpu_memory_correlation
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endcpu_memory_correlation
10:00:0010:05:000.87

COVAR / COVAR_SAMP

Sample covariance.

Syntax:

  • COVAR(y, x) - Sample covariance
  • COVAR_SAMP(y, x) - Same as COVAR

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  COVAR_SAMP(m1.value, m2.value) as covariance
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endcovariance
10:00:0010:05:001247.50

COVAR_POP

Population covariance.

Syntax:

  • COVAR_POP(y, x) - Population covariance

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  COVAR_POP(m1.value, m2.value) as pop_covariance,
  COVAR_SAMP(m1.value, m2.value) as sample_covariance
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endpop_covariancesample_covariance
10:00:0010:05:00623.751247.50

REGR_SLOPE

Slope of linear regression line.

Syntax:

  • REGR_SLOPE(y, x) - Slope coefficient (change in Y per unit X)

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  REGR_SLOPE(m1.value, m2.value) as slope
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endslope
10:00:0010:05:000.21

REGR_INTERCEPT

Y-intercept of linear regression line.

Syntax:

  • REGR_INTERCEPT(y, x) - Intercept (Y when X=0)

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  REGR_INTERCEPT(m1.value, m2.value) as intercept,
  REGR_SLOPE(m1.value, m2.value) as slope
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endinterceptslope
10:00:0010:05:00-387.420.21

REGR_R2

R-squared (coefficient of determination).

Syntax:

  • REGR_R2(y, x) - How well the regression fits (0 to 1)

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  REGR_R2(m1.value, m2.value) as r_squared
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endr_squared
10:00:0010:05:000.76

REGR_COUNT

Count of non-null pairs.

Syntax:

  • REGR_COUNT(y, x) - Number of rows with non-null X and Y

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  REGR_COUNT(m1.value, m2.value) as pair_count
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endpair_count
10:00:0010:05:002

REGR_AVGX / REGR_AVGY

Average of X or Y values in regression.

Syntax:

  • REGR_AVGX(y, x) - Average of X values
  • REGR_AVGY(y, x) - Average of Y values

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  REGR_AVGX(m1.value, m2.value) as avg_memory,
  REGR_AVGY(m1.value, m2.value) as avg_cpu
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endavg_memoryavg_cpu
10:00:0010:05:002102.0053.90

REGR_SXX / REGR_SYY / REGR_SXY

Sum of squares for regression.

Syntax:

  • REGR_SXX(y, x) - Sum of squares of X deviations
  • REGR_SYY(y, x) - Sum of squares of Y deviations
  • REGR_SXY(y, x) - Sum of products of deviations

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  REGR_SXX(m1.value, m2.value) as sxx,
  REGR_SYY(m1.value, m2.value) as syy,
  REGR_SXY(m1.value, m2.value) as sxy
FROM metrics m1
JOIN metrics m2 ON m1.timestamp = m2.timestamp
WHERE m1.name = 'cpu_usage' AND m2.name = 'memory_mb'
  AND m1.tags = m2.tags
GROUP BY window
window_startwindow_endsxxsyysxy
10:00:0010:05:005832.00141.121247.50

APPROX_DISTINCT

Approximate count of distinct values using HyperLogLog.

Syntax:

  • APPROX_DISTINCT(expression) - Fast approximate unique count

Example:

SELECT
  TUMBLE(interval '1 minute') as window,
  COUNT(DISTINCT user_id) as exact_unique,
  APPROX_DISTINCT(user_id) as approx_unique
FROM events
GROUP BY window
window_startwindow_endexact_uniqueapprox_unique
10:00:0010:01:0033
10:01:0010:02:0022

Much faster than COUNT(DISTINCT) for high-cardinality columns.


APPROX_MEDIAN

Approximate median using t-digest.

Syntax:

  • APPROX_MEDIAN(expression) - Fast 50th percentile

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  APPROX_MEDIAN(amount) as approx_median,
  MEDIAN(amount) as exact_median
FROM orders
GROUP BY window
window_startwindow_endapprox_medianexact_median
10:00:0010:05:00137.50137.50

APPROX_PERCENTILE_CONT

Approximate percentile using t-digest.

Syntax:

  • APPROX_PERCENTILE_CONT(expression, percentile) - Percentile (0 to 1)
  • APPROX_PERCENTILE_CONT(expression, percentile, centroids) - With precision

Example:

SELECT
  name,
  TUMBLE(interval '5 minutes') as window,
  APPROX_PERCENTILE_CONT(value, 0.50) as p50,
  APPROX_PERCENTILE_CONT(value, 0.90) as p90,
  APPROX_PERCENTILE_CONT(value, 0.99) as p99
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY name, window
namewindow_startwindow_endp50p90p99
cpu_usage10:00:0010:05:0052.1568.7470.95

APPROX_PERCENTILE_CONT_WITH_WEIGHT

Weighted approximate percentile.

Syntax:

  • APPROX_PERCENTILE_CONT_WITH_WEIGHT(value, weight, percentile) - Weighted percentile

Example:

SELECT
  TUMBLE(interval '5 minutes') as window,
  APPROX_PERCENTILE_CONT_WITH_WEIGHT(
    value,
    CAST(1 as DOUBLE),
    0.95
  ) as weighted_p95
FROM metrics
WHERE name = 'cpu_usage'
GROUP BY window
window_startwindow_endweighted_p95
10:00:0010:05:0069.52

Useful when aggregating pre-aggregated data with counts.


Patterns

Rate Calculations

SELECT
  TUMBLE(interval '1 minute') as window,
  COUNT(*) as total,
  SUM(CASE WHEN level = 'ERROR' THEN 1 ELSE 0 END) as errors,
  SUM(CASE WHEN level = 'ERROR' THEN 1 ELSE 0 END)::DOUBLE / COUNT(*) as error_rate
FROM logs
GROUP BY window
window_startwindow_endtotalerrorserror_rate
10:00:0010:01:00100.00
10:01:0010:02:00210.50
10:02:0010:03:00210.50
10:03:0010:04:00100.00

Conditional Aggregates

SELECT
  TUMBLE(interval '5 minutes') as window,
  SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) as completed_revenue,
  SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) as pending_revenue,
  COUNT(CASE WHEN status = 'cancelled' THEN 1 END) as cancelled_count
FROM orders
GROUP BY window
window_startwindow_endcompleted_revenuepending_revenuecancelled_count
10:00:0010:05:00740.49650.001

Null Handling

Most aggregates ignore NULL values:

SELECT
  COUNT(*) as count_all,
  COUNT(region) as count_non_null,
  SUM(amount) as sum_amount,
  AVG(amount) as avg_amount
FROM orders
GROUP BY TUMBLE(interval '1 hour')
count_allcount_non_nullsum_amountavg_amount
881440.49180.06