Copy logo as SVG

Copy brandmark as SVG

Open brand kit

  • Products

    Products & Services

    Timescale Cloud

    A reliable PostgreSQL cloud for your workloads

    Timescale Cloud

    Support Services

    Support options for your use case, infrastructure, and budget

    Cloud supportSelf-managed support

    Workloads

    Time Series

    Lightning-fast ingest and querying on PostgreSQL

    Time Series

    Real-Time Analytics

    The fastest real-time analytics on PostgreSQL

    Real-Time Analytics

    AI and Vector

    Build RAG, search, and Al agents, all on PostgreSQL

    AI and Vector

    Open-Source Extensions and Tools

    Time Series and Real-Time Analytics

    A reliable PostgreSQL cloud for your workloads

    timescaledbtimescaledb-toolkit

    AI and Vector

    pgaipgvectorscale

    Security Scanner

    pgspot
    Explore our Enterprise Tier

    Security, reliability, and support for demanding businesses.

  • Solutions

    Industries That Rely On Us

    CryptoIndustrial IoTEnergyTransportation and logisticsManufacturing

    Featured Articles

    Scale PostgreSQL via Partitioning: A Dev’s Intro to Hypertables

    Scale PostgreSQL via Partitioning: A Dev’s Intro to Hypertables

    Read more

    Scale PostgreSQL via Partitioning: A Dev’s Intro to Hypertables
    Boosting Postgres INSERT Performance by 2x With UNNEST

    Boosting Postgres INSERT Performance by 2x With UNNEST

    Read more

    Boosting Postgres INSERT Performance by 2x With UNNEST
  • Customers
  • Developers

    Documentation

    Timescale Docs

    Learn how to make PostgreSQL faster with our documentation

    Timescale Docs

    AI and Vector

    Learn how to use PostgreSQL for Al with our documentation

    AI and Vector

    Learn PostgreSQL

    Learn PostgreSQL

    Learn the PostgreSQL basics and scale your database performance

    OverviewTime series basicsPostgres basicsPostgres guidesBenchmarksPostgres cheat sheet

    Timescale Benchmarks

    Timescale benchmarks

    See how Timescale performs against the competition

    vs RDS PostgreSQLvs Amazon Timestreamvs Influxvs MongoDBvs ClickHousevs Auroravs Cassandravs vanilla PostgreSQL

    More

    Blog

    Tutorials

    Support

    Community

    Changelog

    GitHub

    Slack

    Forum

    Launch Hub

    Partners

  • Pricing
Contact usLog InTry for free
Home
What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonTime-Series Analysis and Forecasting With Python The Best Time-Series Databases ComparedUnderstanding Autoregressive Time-Series ModelingAlternatives to TimescaleAWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisCreating a Fast Time-Series Graph With Postgres Materialized ViewsWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhat Is Temporal Data?
Optimizing Your Database: A Deep Dive into PostgreSQL Data TypesHow to Install PostgreSQL on LinuxHow to Install PostgreSQL on MacOSUnderstanding percentile_cont() and percentile_disc() in PostgreSQLUsing PostgreSQL UPDATE With JOINUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsUnderstanding PostgreSQLUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ 5 Common Connection Errors in PostgreSQL and How to Solve ThemPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding PostgreSQL Date and Time FunctionsPostgreSQL Join Type TheoryData Partitioning: What It Is and Why It MattersWhat Is Data Compression and How Does It Work?What Characters Are Allowed in PostgreSQL Strings?Understanding PostgreSQL's COALESCE FunctionUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesUnderstanding GROUP BY in PostgreSQL (With Examples)How to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Self-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesWhat Is Data Transformation, and Why Is It Important?Understanding PostgreSQL User-Defined FunctionsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding SQL Aggregate FunctionsUnderstanding Foreign Keys in PostgreSQLUnderstanding FILTER in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding PostgreSQL WITHIN GROUPUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsUnderstanding WHERE in PostgreSQL (With Examples)PostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?What Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTA Guide to PostgreSQL ViewsUnderstanding ACID Compliance Strategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
How to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformancePostgreSQL Performance Tuning: Optimizing Database IndexesOptimizing Array Queries With GIN Indexes in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLA Guide to pg_restore (and pg_restore Example)Handling Large Objects in PostgresPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersHow to Reduce Bloat in Large PostgreSQL TablesDetermining the Optimal Postgres Partition SizeGuide to PostgreSQL Database OperationsPostgreSQL Performance Tuning: How to Size Your DatabaseGuide to PostgreSQL PerformanceDesigning Your Database Schema: Wide vs. Narrow Postgres TablesWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideUnderstanding PostgreSQL TablespacesGuide to Postgres Data ManagementHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseA Guide to Scaling PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningHow to Use PostgreSQL for Data TransformationWhen to Consider Postgres PartitioningRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignTop PostgreSQL Drivers for PythonNavigating Growing PostgreSQL Tables With Partitioning (and More)An Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINBest Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to PostgreSQL SecurityBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables How to Compute Standard Deviation With PostgreSQLHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLBest Practices for PostgreSQL Database OperationsHow to Store Video in PostgreSQL Using BYTEAHow to Handle High-Cardinality Data in PostgreSQLHow to Use PostgreSQL for Data NormalizationTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYBest Practices for Postgres SecurityBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL Data AnalysisBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQL
PostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Intro to uuid-ossp
PostgreSQL as a Real-Time Analytics DatabaseHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQLHow to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabaseData Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)What Is the Best Database for Real-Time AnalyticsColumnar Databases vs. Row-Oriented Databases: Which to Choose?
Text-to-SQL: A Developer’s Zero-to-Hero GuideA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataA Guide to Cosine SimilarityImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereWhat Is Vector Search? Vector Search vs Semantic SearchNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and PgvectorWhen Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANN
Why You Should Use PostgreSQL for Industrial IoT DataHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow United Manufacturing Hub Is Introducing Open Source to ManufacturingFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleUnderstanding IoT (Internet of Things)Moving Past Legacy Systems: Data Historian vs. Time-Series DatabaseBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDBHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLA Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLHow to Choose an IoT Database
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative What InfluxDB Got Wrong InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)TimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
More Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsTimescale Tips: Testing Your Chunk SizeHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find Them5 Ways to Monitor Your PostgreSQL Database
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections
Performance
Schema design
PostgreSQL Performance Tuning: Designing and Implementing Your Database Schema
Guide to PostgreSQL Performance
Performance tuning
PostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesPostgreSQL Performance Tuning: How to Size Your Database
Partitioning
Determining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningPg_partman vs. Hypertables for Postgres Partitioning
Database design and modeling
An Intro to Data Modeling on PostgreSQLDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables Guide to PostgreSQL Database Design
Database replication
A PostgreSQL Database Replication Guide
Data aggregation
How PostgreSQL Data Aggregation Works
Scaling postgres
Building a Scalable DatabaseA Guide to Scaling PostgreSQL
Database operations
Guide to PostgreSQL Database Operations
JSON
How to Index JSONB Columns in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQL
Database indexes
How to Monitor and Optimize PostgreSQL Index Performance
Query optimization
Explaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?Optimizing Array Queries With GIN Indexes in PostgreSQLRecursive Query in SQL: What It Is, and How to Write One
Database backups and restore
A Guide to pg_restore (and pg_restore Example)
Data management
Understanding PostgreSQL TablespacesGuide to Postgres Data ManagementHandling Large Objects in Postgres
Data analysis
A Guide to Data Analysis on PostgreSQLHow to Compute Standard Deviation With PostgreSQL
Data transformation
How to Use PostgreSQL for Data Transformation
Database tools and libraries
How to Use Psycopg2: The PostgreSQL Adapter for PythonTop PostgreSQL Drivers for Python
Database security
What Is Audit Logging and How to Enable It in PostgreSQLGuide to PostgreSQL Security

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Timescale Shop Code Of Conduct

Subscribe to the Timescale Newsletter

By submitting, you acknowledge Timescale’s Privacy Policy

2025 © Timescale Inc. All rights reserved.

Privacy preferences

LegalPrivacySitemap

Optimizing Array Queries With GIN Indexes in PostgreSQL

Try for free

Start supercharging your PostgreSQL today.

Written by Warda Bibi

Efficient querying is crucial when managing data in PostgreSQL. One way to ensure this is by using the right indexes on your tables. Indexes help queries retrieve results faster by skipping irrelevant parts of the table. Among the various index types PostgreSQL offers, one important index type is GIN (Generalized Inverted Index).  

Although its name might sound complex, the concept is straightforward. GIN indexes excel at searching for elements within composite items such as arrays, text fields, or JSON. In this article, we'll walk you through on how to optimize array queries using GIN indexes.

This exploration of GIN indexes highlights why we've invested heavily in optimizing indexing capabilities at Timescale. Our work on PostgreSQL indexes for compressed columnar data represents a significant advancement, allowing developers to enjoy the benefits of columnar compression while maintaining the flexibility of PostgreSQL's rich indexing capabilities. Check it out!

The Importance of Indexing an Array

Without indexes, PostgreSQL processes queries by performing sequential full-table scans, examining each row in the table to find matches. As the dataset grows, this approach becomes inefficient, leading to slower query execution due to increased I/O operations and higher CPU usage. 

Consider a table with a column sensor_readings { Temperature , Humidity , Motion , BatteryLevel , Alert } containing arrays of values captured by IoT devices: 

  • Row 1: {25.5, 60.2,Null,Null, 1} (Temperature: 25.5°C, Humidity: 60.2%, Alert: 1) 

  • Row 2: {Null,Null,0, 15.3, 1} (Motion: 0, Battery Level: 15.3%, Alert: 1) 

  • Row 3: {25.5, Null, 0, 10.5,Null} (Temperature: 25.5°C, Motion: 0, Battery Level: 10.5%) 

Suppose you want to find all rows where the sensor_readings array contains the values {25.5, 1} (Temperature = 25.5°C and Alert = 1). PostgreSQL will compare each value in the search array {25.5, 1} with the elements in the sensor_readings array for every row. 

For example: 

  • In row 1 ({25.5, 60.2,Null,Null, 1}), PostgreSQL checks if both 25.5 and 1 are present. Since they are, this row is a match. 

  • In row 2 ({Null,Null,0, 15.3, 1}), only 1 is present, so this row is not a match. 

  • In row 3 ({25.5, Null, 0, 10.5,Null}), only 25.5 is present, so this row is not a match. 

PostgreSQL goes through each row in the table and checks if all the values in the search array match those in the column's array. However, performing this comparison for each row can be very slow for large datasets, potentially resulting in thousands of operations, especially as the size of the arrays and the number of rows increase.

By using an index, PostgreSQL avoids these repeated, slow comparisons. Instead of comparing entire arrays row-by-row, the index can quickly identify rows that match the condition, dramatically improving performance, even for large datasets. 

For faster queries, learn how to optimize your PostgreSQL database indexes.

What Are GIN Indexes?  

The Generalized Inverted Index (GIN) is specifically designed to index composite values, such as arrays or JSON fields, in tables. It is particularly useful for queries that need to search for elements within these composite items efficiently. 

 Let's say you have transaction data in a database where: 

  • Each row represents an account. 

  • Each account has done multiple transactions, stored as an array of transaction values.

Acc_id

Transactions

1

{$100, $400, $600}

2

{$200, $500}

Here:

  • Row 1: Account 1 (Acc1) has completed transactions of $100, $400, and $600. 

  • Row 2: Account 2 (Acc2) has completed transactions of $200 and $500.   

Now, you frequently run queries to find transactions that meet specific criteria, such as transactions exceeding a certain value (e.g., "$300"). 

Without an index, the database must scan each row and check if any transaction value in the array meets the condition (e.g., "greater than $300"). This requires evaluating every element in every array across all rows, which is slow and resource-intensive.  

A GIN index works by creating a mapping between each unique transaction value (the key) and the corresponding row IDs (accounts) that contain that value. This efficient mapping allows PostgreSQL to quickly locate rows containing specific elements without scanning the entire table. 

  • If a transaction value of $300 is found in the array of transactions for Account 1 (Row 1), the index records that $300 is associated with Account 1. 

  • Similarly, if $300 is found in Account 2, the index maps $300 to Account 2 as well. 

  • This mapping is stored in a structure called listOfAccounts, which is a set of row IDs (accounts) linked to a specific transaction value. 

  • For example, if $300 appears in the transaction arrays for Accounts 1 and 2, the listOfAccounts for $300 would be {1, 2}. 

This mapping allows the GIN index to quickly reference the accounts (rows) where $300 exists, eliminating the need to scan every row and every transaction. 

The term "generalized" in GIN refers to its ability to handle different types of data, such as arrays, JSON, or full-text documents. However, GIN itself doesn’t understand these data types. Instead, it relies on operator classes to guide it on how to interact with and process these various data types. 

What Are Operator Classes?

An operator class can be thought of as a rulebook that provides GIN with instructions on how to manage specific data types. Each operator class defines the operations that can be performed on the data. For example:  

  • For arrays, does one array contain another (@>), or do two arrays overlap (&&)?  

  • For JSON, does a JSON object contain a specific key (?), or does one JSON object contain another (@>)? 

These instructions ensure that GIN efficiently handles complex data types like arrays and JSON. Each operation, such as @> or ?, is assigned a strategy number within the operator class. For instance, the @> operator for arrays might be strategy 1, and the && operator could be strategy 2. When you run a query like SELECT * FROM my_table WHERE my_array_column @> '{1, 2}'; , the GIN index doesn’t directly understand @>.

Instead, it checks the operator class for arrays, finds strategy number 1, and uses the logic associated with it to execute the operation. 

What makes GIN "generalized" is its ability to handle different data types using separate operator classes. GIN doesn’t need to be redesigned every time a new data type is introduced. You simply define a new operator class that tells it how to work with that type. For example: 

  • One operator class teaches GIN how to work with arrays. 

  • Another operator class teaches it how to handle JSON. 

You can even define custom operator classes for your own data types. This flexibility makes GIN so powerful it can index a wide variety of data types without needing a complete overhaul. By separating the GIN index method (which is flexible) from the operator classes (which are data-type-specific), PostgreSQL can easily support multiple data types. This approach also allows new data types or operations to be added without having to modify the GIN index itself. 

The "inverted" aspect of GIN refers to its index structure, which differs fundamentally from the traditional B-tree index. Instead of mapping a single value to a specific row (as in B-trees), GIN creates a reverse mapping where each indexed value points to a list of rows that contain that value. This design allows GIN to store multiple representations of a single row because a single row can contain multiple indexed values.  

This "inverted" mapping structure allows GIN to handle complex queries exceptionally well, such as those involving multiple conditions (e.g., searching for rows where an array contains specific elements or where a document contains specific words). GIN combines these indexed entries to efficiently retrieve the relevant rows, even for queries that would be costly to execute with other types of indexes. 

Another way of explaining GIN indexes comes from a presentation by Oleg Bartunov and Alexander Korotkov at a PGConf.EU in Prague. They describe a GIN index like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:  

image

 In the table with index, ‘compensation’ points to pages 30 and 68, indicating that these pages contain the keyword ‘compensation’. Similarly, ‘accelerometers’ point to pages 5, 10, 25, 28, 30, 36, 58, 59, 61, 73, 74, and 68, showing that these pages include the keyword ‘accelerometers’. When searching for ‘compensation accelerometers’, page 30 is common to both keywords. Therefore, page 30 would be returned as the result. 

Before we move on, know that GIN indexes only support bitmap index scans (not index scan or index-only scan). This happens because a GIN index can list the same row pointer multiple times under different tokens (keys) that the field is broken into. To prevent returning duplicate rows, GIN forces queries through a bitmap index scan, which inherently deduplicates these pointers. Let's see an example.

Suppose we have a table ‘document’ where each document contains an array of keywords, and we have a GIN index on the ‘keywords’ column. 

document_id 

keywords 

1 

{apple, banana, cherry} 

2 

{banana, cherry, date} 

3 

{apple, cherry, date} 

4 

{banana, apple} 

Here, the GIN index stores a list of tokens (keywords) and the document IDs (row pointers) associated with those tokens. Each token (keyword) points to the rows where it appears.  Here is the internal GIN index representation:

  • apple → {1, 3, 4} (appears in documents 1, 3, and 4) 

  • banana → {1, 2, 4} (appears in documents 1, 2, and 4) 

  • cherry → {1, 2, 3} (appears in documents 1, 2, and 3) 

  • date → {2, 3} (appears in documents 2 and 3) 

Now, when a query is run to find documents that contain both ‘apple’ and ‘banana’, PostgreSQL uses the bitmap index scan to process the GIN index and ensure deduplication. Let's say we run the following query: 

SELECT * FROM documents WHERE keywords @> ARRAY ['apple', 'banana'];  

The operator @> checks if the array of keywords contains the specified values (in this case, 'apple' and 'banana'). PostgreSQL first retrieves the rows for the keywords apple and banana from the GIN index. 

  • For ‘apple’, the GIN index tells us the rows are {1, 3, 4}. 

  • For ‘banana’, the GIN index tells us the rows are {1, 2, 4}. 

PostgreSQL then creates bitmaps for both sets of results. A bitmap is a data structure that represents rows using a series of bits, where each bit corresponds to a unique row and is set to 1 for a match and 0 otherwise. For apple ({1, 3, 4}), the bitmap might look like: 

1 0 1 1 

Where: 

  • 1 indicates the document is included (e.g., documents 1, 3, and 4). 

  • 0 indicates the document is not included (e.g., document 2 is not included for apple). 

For banana ({1, 2, 4}), the bitmap might look like: 

1 1 0 1 

Where: 

  • 1 indicates the document is included (e.g., documents 1, 2, and 4). 

  • 0 indicates the document is not included (e.g., document 3 is not included for banana). 

PostgreSQL now ANDs the two bitmaps together to find the rows that match both conditions (contain both 'apple' and 'banana'). 

Bitmap for apple:      1 0 1 1 Bitmap for banana:   1 1 0 1 -------------------------------------------- Result bitmap:            1 0 0 1  

The result bitmap shows 1 for documents 1 and 4, meaning these are the only documents that contain both 'apple' and 'banana'. Documents 2 and 3 are excluded because they do not meet both conditions. 

The deduplication process occurs in the AND operation by ensuring that each document is included only once in the result. The final query result will be: 

document_id

keywords

1

{apple, banana, cherry}

4

{banana, apple}

These are the documents where both apple and banana appear in the keywords array.  Don’t be surprised when EXPLAIN always shows bitmap index/heap scans for your GIN indexes. Bitmap scans are ideal for deduplication as they can easily identify and combine duplicate row pointers, resulting in a clean set of relevant rows. 

Supported operators 

The standard distribution of PostgreSQL includes a GIN operator class for arrays, which supports indexed queries using these operators: 

  • <@ : If the left-hand array is contained within (or a subset of) the right-hand array)

  • @> : If the left-hand array contains all the elements of the right-hand array

  • = : If two arrays are equal

  • && : If two arrays have any elements in common

 

Implementation 

Let's say we have a table “inventory” with column “product” of type TEXT [ ] (string array). We also have a GIN index on the product column. This index will help optimize queries that involve the array operators like <@, @>, =, and &&. 

image

To ensure that PostgreSQL uses the GIN index instead of performing a sequential scan, you'll need a substantial number of rows. A sequential scan is more likely to occur if the dataset is small because PostgreSQL might decide that scanning the entire table is more efficient than using the index.  

image

This will insert 1,000 rows with a repetitive set of string values in the array column.  

<@ (Contained by) 

The <@ operator is used to check if the left-hand array is fully contained within the right-hand array. 

  • Left-hand array: This is the array stored in your database column (e.g., the product column in an inventory table). It acts as the "source" array being checked. 

  • Right-hand array: This is the array you specify in your query for comparison. It contains the elements you want to verify. 

PostgreSQL uses the <@ operator to compare the two arrays and determine if all the elements in the right-hand array exist within the left-hand array. 

This operator is particularly useful for filtering rows based on whether their array values match a desired subset. 

image

 

@> (Contains) 

The @> operator checks if the right-hand array is contained by the left-hand array, in other words, the right array must be a subset of the left array. 

image

The results will include arrays that have [laptop, server] as a subset. 

= (Equality) 

The = operator checks if two arrays are exactly equal. It compares both the size and the content of the arrays, meaning the elements must be in the same order. 

image

 

&& (Overlap) 

The && operator in PostgreSQL checks if two arrays overlap if they share at least one element. This is different from the equality operator because it doesn't require the arrays to be the same length or order; it only checks for common elements between the two arrays. 

image

  

Using ANY with GIN index

The ANY operator in PostgreSQL is used to compare a value against any element of an array. For example, value = ANY(array) checks if the value matches at least one element in the array. It's a shorthand for multiple OR conditions, making queries involving arrays more concise. So instead of writing  

value = element1 OR value = element2 OR value = element3   (where element1, element2, etc., are values in the array), you can simply write  

value = ANY(array) Note: PostgreSQL will not use the GIN index with ANY because the =ANY operator does not work with GIN indexes. The ANY construct can be used in combination with various operators, but it is not an operator itself. When ANY is used as: 

constant = ANY (array_expression) only indexes that support the = operator on array elements would qualify. In this case, GIN indexes are not suitable. To utilize GIN, you can use the <@ operator with a one-element array instead.

Suppose we have a table test with a column of type int[]. And we also have a GIN index in this column. 

image

If you try to run a search query using ANY, the query will likely result in a sequential scan. On the other hand, using @> will yield the same result but with a bitmap heap scan. 

image

Moreover, constant = ANY (array_expression) is not completely equivalent to array_expression @> ARRAY[constant]. Array operators return an error if any NULL elements are involved, while the ANY construct can handle NULL on either side, leading to different results for data type mismatches. 

Intarray Extension Challenges

In PostgreSQL, integer arrays (int [ ]) are a specialized data type that allows you to store multiple integer values within a single column of a table. Each element in the array is of the integer type, and the array itself can be one-dimensional or multi-dimensional. 

For example, a column of type int [ ] can store data like: 

  • {1, 2, 3} (a one-dimensional array) 

  • {{1, 2}, {3, 4}} (a two-dimensional array) 

 

Note: The intarray extension provides specialized functions and operators for integer arrays in PostgreSQL. However, these custom operators replace PostgreSQL's default array operators (like @>, <@). If you create a GIN index without accounting for this, PostgreSQL will ignore the index for queries using these operators and perform a sequential scan instead, significantly slowing down performance on large datasets.

For example, a query like that 

SELECT * FROM my_table WHERE codes @> array[123];

will skip the GIN index and rely on a sequential scan. 

To fix this, you need to specify the correct operator class for integer arrays when creating the GIN index. The intarray extension provides a dedicated operator class called gin__int_ops, which ensures that the index works with the custom operators introduced by intarray. 

Here’s how to create the index with the correct operator class: 

image

 

The Downside of GIN Indexes: Expensive Updates 

Unlike regular indexes that create one entry per row, GIN indexes can create multiple entries for a single row, making them versatile but challenging to update. Modifying a GIN index can be expensive, potentially affecting tens or hundreds of index entries for just one row. This overhead slows down INSERT and UPDATE operations, especially with large datasets.

To address this, PostgreSQL uses fastupdate (enabled by default) for GIN indexes. Rather than immediately updating the index, fastupdate batches multiple updates into a temporary "pending list" that stores new or updated entries. These "deferred updates" improve initial write performance but must eventually be processed, adding overhead later.

 This deferred data is flushed to the main index in one of three scenarios: 

  • When the size of the pending list exceeds the value of gin_pending_list_limit, PostgreSQL triggers a process to flush the accumulated entries from the pending list into the main GIN index. The gin_pending_list_limit is a configuration parameter in PostgreSQL that determines the maximum size of the pending list for a GIN index. By default, this limit is set to 4 MB. 

  • When an explicit call to gin_clean_pending_list is made, it manually flushes the contents of the pending list to the main GIN index. 

  • When autovacuum runs on the table with the GIN index, it automatically maintains tables and indexes in PostgreSQL by cleaning up dead tuples, analyzing table statistics, and flushing the pending list of GIN indexes as part of its routine maintenance at the end of the vacuum process. 

While this deferred flushing helps optimize performance initially, it can cause noticeable slowdowns during write operations if the pending list becomes too large. For example, after every Nth INSERT or UPDATE, when the pending list exceeds its limit, PostgreSQL will flush the data to the main index, resulting in a significant delay for that operation.   

Conclusion 

GIN indexes provide powerful capabilities for specialized queries and complex data types in PostgreSQL, but they must be used strategically, especially on write-heavy tables.

This exploration of GIN indexes highlights why we've invested heavily in optimizing indexing capabilities at Timescale. Our work on PostgreSQL indexes for compressed columnar data represents a significant advancement, allowing developers to enjoy the benefits of columnar compression while maintaining the flexibility of PostgreSQL's rich indexing capabilities.

TimescaleDB leverages PostgreSQL's native capabilities, including its support for GIN indexes, while adding several performance enhancements for time-partitioned data. This makes your queries faster so you can power analytics in real time.

Sign up for a free and fully managed PostgreSQL cloud on Timescale Cloud (no credit card required), or install TimescaleDB and see firsthand how we're pushing the boundaries of PostgreSQL.

On this page

    Try for free

    Start supercharging your PostgreSQL today.