Data Processing With PostgreSQL Window Functions

You can use window functions in PostgreSQL or TimescaleDB to perform complex calculations across sets of rows (termed as a “window”) related to the current row.

A window, or analytic, function uses the values from one or multiple rows in a database table to perform a calculation and return the value.

Window functions are different from aggregate functions because the rows aren’t grouped into a single output. In a window function, each row can remain separate, but the function has access to more than just the data in the current row.

Window functions always use an OVER clause directly after the query. This clause is what makes the window function different from a normal function. The OVER clause creates window frames in rows of data by determining how many rows in the query are split up into each calculation. When you use a window function, the row's value is computed based on all the rows in the same partition as the current row.

You can use window functions with PARTITION BY and ORDER BY. PARTITION BY defines the criteria that records must match to be part of the window frame. ORDER BY determines the order of the records.

OVER, PARTITION BY, and ORDER BY syntax:

OVER ([PARTITION BY <columns>] [ORDER BY <columns>])

ROWS BETWEEN is used to specify a window frame in relation to the current row.

ROWS BETWEEN syntax:

OVER ([PARTITION BY <columns>] [ORDER BY <columns>] [ROWS BETWEEN <lower_bound> AND <upper_bound>])

The bounds in ROWS BETWEEN can be anyone of these five things:

UNBOUNDED PRECEDING: All rows before the current row.
n PRECEDING: n rows before the current row.
CURRENT ROW: Just the current row.
n FOLLOWING: n rows after the current row.
UNBOUNDED FOLLOWING: All rows after the current row.

Learn how to create, list, call, and edit Postgres functions.

Use WINDOW to create a window clause that separates a window function from the SELECT clause.

WINDOW syntax:

OVER w FROM WINDOW w AS ([PARTITION BY <columns>] [ORDER BY <columns>] [ROWS BETWEEN <lower_bound> AND <upper_bound>])

Examples

Using a window function over all the rows of a result set
Ordering the records in a window frame
Partitioning the records in a window frame
Ordering and partitioning the records in a window frame
Using a window clause
Using ROWS BETWEEN in a window clause

These examples use sales data in a database table called sales_data, like this:

id	sale_time	branch	item	quantity	total
1	2021-08-11	New York	Watch	1	100
2	2021-08-11	Chicago	Watch	2	200
3	2021-08-12	Chicago	Necklace	3	600
4	2021-08-13	Phoenix	Ring	1	250
5	2021-08-13	New York	Ring	1	250
6	2021-08-14	Miami	Watch	2	200

Using a window function over all the rows of a result set

If you use OVER without defining a PARTITION BY, ORDER BY, or ROWS clause when using OVER, the calculation is performed on a window containing all the rows in the record set. Here is an example query to get a summary of sales:

SELECT branch, SUM(total) OVER() AS sum FROM sales_data;

Results:

branch	sum
New York	1600
Chicago	1600
Chicago	1600
Phoenix	1600
New York	1600
Miami	1600

The amount in the sum column is a sum of all the values in the table.

Ordering the records in a window frame

If you combine an ORDER BY clause with OVER, aggregation is performed against the current row and all previous rows in the result set. This is because, by default, window frames use UNBOUNDED PROCEEDING for aggregation.

This example query also gets a summary of sales, but it orders the results by the time column:

SELECT branch, SUM(total) OVER(ORDER BY id) AS sum FROM sales_data;

Results:

branch	sum
New York	100
Chicago	300
Chicago	900
Phoenix	1150
New York	1400
Miami	1600

The amount in the sum is a running total of sales.

If you order the results by a column that contains duplicate values, the results turn out differently. For example:

SELECT branch, SUM(total) OVER(ORDER BY sale_time) AS sum FROM sales_data;

Results:

branch	sum
New York	300
Chicago	300
Chicago	900
Phoenix	1400
New York	1400
Miami	1600

The aggregate sum is still a running total but it is not the same as in the previous example. That is because the window includes all preceding rows, and also includes rows where the sale times match.

Partitioning the records in a window frame

PARTITION BY works like GROUP BY in a window frame. It groups all the results by the condition you set. This example uses GROUP BY to get a sum of sales for each branch in the data:

SELECT branch, SUM(total) AS sum FROM sales_data sd GROUP BY branch;

Results:

branch	sum
Chicago	800
New York	350
Miami	200
Phoenix	250

This example uses PARTITION BY on the window frame:

SELECT id, branch, SUM(total) OVER(PARTITION BY branch) AS sum FROM sales_data;

Results:

id	branch	sum
2	Chicago	800
3	Chicago	800
6	Miami	200
1	New York	350
5	New York	350
4	Phoenix	250

The sums are the same in both examples, but the second example did not require them to be grouped.

Ordering and partitioning the records in a window frame

When you use both ORDER BY and PARTITION BY in OVER, you can specify the order of the results in each partition to which you apply the window function. This example retrieves a running total of sales by location in the data set:

SELECT sale_time, branch, SUM(total) OVER(PARTITION BY branch ORDER BY sale_time) AS sum FROM sales_data;

Results:

sale_time	branch	sum
2021-08-11	Chicago	200
2021-08-12	Chicago	800
2021-08-14	Miami	200
2021-08-11	New York	100
2021-08-13	New York	350
2021-08-13	Phoenix	250

Using a window clause

If you don’t want to use an inline window function, you can convert it to a window clause. Here is the previous example query rewritten with a window clause. It returns the same results in both formats. This is useful if you want to use multiple window functions in your query:

SELECT sale_time, branch, SUM(total) OVER w AS sum 
FROM sales_data WINDOW w AS (PARTITION BY branch ORDER BY sale_time);

Using ROWS BETWEEN in a window clause

These examples use a dataset containing the precipitation and temperature data from a couple of cities over five days. This data is in a table called city_data:

date	city	temperature	precipitation
2021-09-01	Miami	65.30	0.28
2021-09-01	Atlanta	63.14	0.20
2021-09-02	Miami	64.40	0.79
2021-09-02	Atlanta	62.60	0.59
2021-09-03	Miami	68.18	0.47
2021-09-03	Atlanta	66.20	0.39
2021-09-04	Miami	68.36	0.00
2021-09-04	Atlanta	67.28	0.00
2021-09-05	Miami	72.50	0.00
2021-09-05	Atlanta	68.72	0.00

When you use ROWS BETWEEN in a window clause, the ORDER BY clause works a bit differently.

When you use ORDER BY in your window frame, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. However, if you don’t use ORDER BY, the default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

It’s important to think about how you want to use the ORDER BY clause in your window frame, especially when you also are using a ROWS clause.

For example, If you want to calculate a three-day moving average of the temperatures in each city, you can use this query:

SELECT city, date, temperature,
    AVG(temperature) OVER (
      PARTITION BY city
      ORDER BY date DESC
      ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) avg_3day
FROM city_data
ORDER BY city, date;

To get a three-day moving average of the temperature for each city, start by partitioning the window frame by the city. Then, you have to order the date in each city partition so that you can select a three-day set of rows based on the location of the current row. You can then order the date in descending order and use the current row and the next two rows to calculate the average temperature:

Results:

city	date	temperature	avg_3day
Atlanta	2021-09-01	63.14	63.14
Atlanta	2021-09-02	62.60	62.87
Atlanta	2021-09-03	66.20	63.98
Atlanta	2021-09-04	67.28	65.36
Atlanta	2021-09-05	68.72	67.4
Miami	2021-09-01	65.30	65.3
Miami	2021-09-02	64.40	64.85
Miami	2021-09-03	68.18	65.96
Miami	2021-09-04	68.36	66.98
Miami	2021-09-05	72.50	69.68

Because the ROWS clause depends on the ORDER BY clause in the window frame, you can get the same results by ordering the dates ascending in the window frame and using the current row plus the two preceding rows to calculate the average, like this:

SELECT city, date, temperature,
    AVG(temperature) OVER (
      PARTITION BY city
      ORDER BY date ASC
      ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) avg_3day
FROM city_data
ORDER BY city, date;

More PostgreSQL Window Functions

CUME_DIST

CUME_DIST() calculates the cumulative distribution of a value in a set of values. This function can be particularly useful in statistical analysis.

SELECT salesperson_id, COUNT(*), CUME_DIST() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;

DENSE_RANK

DENSE_RANK() assigns a rank to each row within a window partition without gaps in ranking values.

SELECT salesperson_id, COUNT(*), DENSE_RANK() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;

To learn more about how to use RANK() and DENSE_RANK(), check out Understanding RANK() and DENSE_RANK() in PostgreSQL.

FIRST_VALUE

FIRST_VALUE() returns the first value in an ordered set of values.

SELECT product_name, sales, FIRST_VALUE(product_name) OVER (ORDER BY sales DESC)
FROM product_sales;

LAG

LAG() fetches the value from a previous row in the same result set.

SELECT product_name, sales, LAG(sales) OVER (ORDER BY sales)
FROM product_sales;

LAST_VALUE

LAST_VALUE() returns the last value in an ordered set of values.

SELECT product_name, sales, LAST_VALUE(product_name) OVER (ORDER BY sales DESC)
FROM product_sales;

LEAD

LEAD() fetches the value from a subsequent row in the same result set.

SELECT product_name, sales, LEAD(sales) OVER (ORDER BY sales)
FROM product_sales;

NTILE

NTILE(n) divides an ordered result set into n number of approximately equal groups.

SELECT product_name, sales, NTILE(4) OVER (ORDER BY sales)
FROM product_sales;

NTH_VALUE

NTH_VALUE(n) returns the nth row's value from the window frame's first row.

SELECT product_name, sales, NTH_VALUE(product_name, 2) OVER (ORDER BY sales DESC)
FROM product_sales;

PERCENT_RANK

PERCENT_RANK() calculates the percentage rank of a value within a group of values.

SELECT salesperson_id, COUNT(*), PERCENT_RANK() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;

RANK

RANK() provides a unique rank to each distinct row within a window partition.

SELECT salesperson_id, COUNT(*), RANK() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;

ROW_NUMBER

ROW_NUMBER() assigns a unique row number to each row within a window partition.

SELECT salesperson_id, COUNT(*), ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;

Data Processing With PostgreSQL Window Functions

Using a window function over all the rows of a result set

Ordering the records in a window frame

Partitioning the records in a window frame

Ordering and partitioning the records in a window frame

Using a window clause

Using ROWS BETWEEN in a window clause

More PostgreSQL Window Functions

CUME_DIST

DENSE_RANK

FIRST_VALUE

LAG

LAST_VALUE

LEAD

NTILE

NTH_VALUE

PERCENT_RANK

RANK

ROW_NUMBER

Further Reading