Published at May 2, 2024

Understanding GROUP BY in PostgreSQL (With Examples)

Start supercharging your PostgreSQL today.

Two rows grouped by column in a GROUP BY

Written by Sarah Conway

Regardless of whether you’re using PostgreSQL or TimescaleDB, you can expect exactly the same behavior from using the GROUP BY clause: it’ll allow you to return rows from SELECT statements in groups of one or more columns. It groups all rows that have the same value in all the columns listed into a single group row. This will remove any redundancy and is particularly useful when used with aggregate functions such as SUM(), AVG(), COUNT(), MIN(), or MAX().

The behavior of GROUP BY is very similar to window functions, yet there is a significant difference: when using GROUP BY, rows are collapsed into groups, so each field can no longer be individually accessed. Window functions allow you to reference information for each specific record in addition to viewing the result of the window function.

In short, you'll want to use GROUP BY when you're looking to eliminate duplicate rows, aggregate the result set, and squash rows prior to performing calculations.

The (simplified) syntax for this clause as part of the overall SELECT statement is as follows:

SELECT <expressions> FROM <tables> GROUP BY <condition>;

The full syntax of the GROUP BY clause itself is:

GROUP BY [ ALL | DISTINCT ] grouping_element [, ...]

The order in which columns are specified does not matter.

And, of course, the use of this clause can be part of a complex SELECT statement as well. The following full syntax can be used as reference:

SELECT [ ALL | DISTINCT | DISTINCT ON (<distinct_expressions>) ] <expressions> FROM <tables> [WHERE <conditions>] [GROUP BY <expressions>] [HAVING <condition>] [ORDER BY <expression> [ ASC | DESC | USING <operator> ] [ NULLS FIRST FIRST | NULLS LASTLAST ]] [LIMIT [ <number_rows> | ALL] [OFFSET OFFSET <offset_value> [ ROW | ROWS ]] [FETCH FETCH { FIRST FIRST | NEXT } [ <fetch_rows> ] { ROW | ROWS } ONLY] [FOR { UPDATE | SHARE } OF <table> [ NOWAIT ]];

You’re able to use GROUP BY with any combination of possible clauses within a SELECT statement. It’s evaluated after the FROM and WHERE clauses, prior to HAVING, DISTINCT, ORDER BY, and LIMIT.

Examples

The following examples make use of a basic table called customers that contains the following fictional data:

customer_id	firstname	lastname	state
1	Cheyenne	Holmes	Arizona
2	Alejandro	Ruiz	California
3	Martha	Parker	California
4	Robert	James	Florida
5	Danny	Angelo	North Carolina

Using PostgreSQL GROUP BY Without an Aggregate Function

The simplest form of the GROUP BY clause is to use it without an aggregate function on a single table, as follows:

SELECT customer_id FROM orders GROUP BY customer_id;

When used in this manner, it works similarly to SELECT DISTINCT in that it will remove duplicate rows from a returned result set. In our case, we’re choosing to return all orders and are grouping by the customer_id; this means that we are essentially choosing to only return the customer_id of customers that have placed an order.

Result:

customer_id

Note: When using GROUP BY without pairing it with an aggregate function in the above manner, for readability, it is encouraged to instead use SELECT DISTINCT as it serves functionally the same purpose with more concise wording. For example, the above query would be greatly simplified and re-written as SELECT DISTINCT customer_id FROM orders;

Using PostgreSQL GROUP BY With an Aggregate Function

Aggregate functions can be paired with GROUP BY for quick and condensed result sets for analytical or informational purposes. Retrieving these summarized and calculated results is the primary use case for GROUP BY.

A simple example that groups the results by a single column is to count the number of customers for each state:

SELECT COUNT(customer_id), state FROM customers GROUP BY state;

Results:

count	state
2	California
1	Arizona
1	North Carolina
1	Florida

Using PostgreSQL GROUP BY With a JOIN

To list the number of orders made by each customer, you can join both tables and group the results by the customer name. For this example, a table was created named orders that contains the following fictional data:

order_id	customer_id	order_total
1	1	85
2	2	21
3	1	101
4	3	44
5	2	3

We’ll select the columns for each customer’s first and last name, along with the count of orders for each customer, and sort these results across multiple columns using GROUP BY.

SELECT customers.firstname, customers.lastname, COUNT(orders.order_id) FROM orders LEFT JOIN customers ON orders.customer_id = customers.customer_id GROUP BY firstname, lastname;

Results:

firstname	lastname	count
Martha	Parker	1
Cheyenne	Holmes	2
Alejandro	Ruiz	2

Using PostgreSQL GROUP BY With a JOIN and ORDER BY

Continuing to build on the previous example, we can also sort the results in ascending or descending order by adding on ORDER BY. Ascending (ASC) order is the default. Else, you can specify DESC to get customers who have placed the most orders first in the result set, as shown here:

SELECT customers.firstname, customers.lastname, COUNT(orders.order_id) FROM orders LEFT JOIN customers ON orders.customer_id = customers.customer_id GROUP BY firstname, lastname ORDER BY count DESC;

Note: If attempting to run a query similar to this for analytical purposes, using the LIMIT clause would also likely be helpful for more performant and efficient results.

Result:

firstname	lastname	count
Cheyenne	Holmes	2
Alejandro	Ruiz	2
Martha	Parker	1

Using PostgreSQL GROUP BY With HAVING

To filter over grouped rows, you can specify the HAVING clause to remove groups from the result set that do not satisfy a specified condition. This clause occurs after all grouping and aggregation of data, meaning after all rows are returned, records that do not meet the specified condition are removed from the resultset.

As an example referencing the two tables we’ve already created and queries we’ve already tried, we can combine prior queries to return the count of customers in each state that have placed more than one order:

SELECT customers.firstname, customers.lastname, customers.state, COUNT(orders.order_id) AS total_number_of_orders FROM customers LEFT JOIN orders ON customers.customer_id = orders.customer_id GROUP BY firstname, lastname, state HAVING COUNT(orders.order_id) > 1;

Result:

firstname	lastname	state	total_number_of_orders
Alejandro	Ruiz	California	2
Cheyenne	Holmes	Arizona	2

Using PostgreSQL GROUP BY With WHERE

What about the WHERE clause? That’ll need to be used before GROUP BY in a SELECT statement, as records are filtered with WHERE before the execution of HAVING. As a result, using WHERE leads to a faster result over HAVING. Both statements can be used within the same query; however, if the WHERE clause can provide the desired result on its own, the use of this clause should be prioritized to ensure an efficiently designed query.

If we want to look for the count of orders for a customer whose name closely matches “Cheyenn,” we’d use the following SQL statement:

SELECT customers.firstname, customers.lastname, customers.state, COUNT(orders.order_id) AS total_number_of_orders FROM customers LEFT JOIN orders ON customers.customer_id = orders.customer_id WHERE firstname LIKE 'Cheyen%' GROUP BY firstname, lastname, state;

Result:

firstname	lastname	state	total_number_of_orders
Cheyenne	Holmes	Arizona	2

Next Steps

To learn more about the GROUP BY clause and how to use it in PostgreSQL, you can check out PostgreSQL’s documentation on SELECT (and the GROUP BY clause) and aggregate expressions. The Table Expressions section of the documentation also contains information relevant to the GROUP BY clause.

It’s possible to achieve even more complex operations by combining GROUP BY with GROUPING SETS, CUBE, and ROLLUP. More information about these three concepts can be found in the official PostgreSQL documentation.

If you want to experiment with GROUP BY in a natively optimized and performant PostgreSQL database, all while enjoying the benefits of automatic data partitioning and columnar compression for further performance improvements and a reduced storage footprint, create a free Timescale account today.