PostgreSQL as a Vector Database: A pgvector Tutorial

PostgreSQL as a Vector Database: A pgvector Tutorial

Vector databases enable efficient storage and search of vector data. They are essential to developing and maintaining AI applications using Large Language Models (LLMs).

With some help from the pgvector extension, you can leverage PostgreSQL as a vector database to store and query OpenAI embeddings. OpenAI embeddings are a type of data representation (in the shape of vectors, i.e., lists of numbers) used to measure the similarity of text strings for OpenAI’s models.

In this article, we work through the example of creating a chatbot to answer questions about Timescale. The chatbot will be trained on content from the Timescale Developer Q&A blog posts. This example will illustrate the key concepts for creating, storing, and querying OpenAI embeddings with PostgreSQL and pgvector.

This example has three parts:

  • Part 1: How to create embeddings from content using the OpenAI API.
  • Part 2: How to use PostgreSQL as a vector database and store OpenAI embedding vectors using pgvector.
  • Part 3: How to use embeddings retrieved from a vector database to augment LLM generation.

One could think of this as a “hello world” tutorial for building a chatbot that can reference a company knowledge base or developer docs.

Jupyter Notebook and Code: You can find all the code used in this tutorial in a Jupyter Notebook, as well as sample content and embeddings on the Timescale GitHub: timescale/vector-cookbook. We recommend cloning the repo and following along by executing the code cells as you read through the tutorial.

The Big Picture: Open AI Embeddings

Foundational models of AI (e.g., GPT-3 or GPT-4) may be missing some information needed to give a good answer to certain specific questions. That’s because relevant information was not in the dataset used to train the model. (For example, the information is stored in private documents or only became available recently.) This lack of data may make these models unsuitable as a chatbot for specific information banks.

Retrieval Augmented Generation (RAG) gives a simple solution: provide additional context to the foundational model in the prompt. For example, if the model has no data on cronuts and you ask, “What is a cronut?” you may get an improper answer. In this case, you can transform the prompt by adding context: “A cronut resembles a doughnut and is made from croissant-like dough filled with flavored cream and fried in grapeseed oil. What is a cronut?” 

The foundational model can then use its knowledge of donuts and croissants to wax eloquently about cronuts. This technique is powerful—it allows you to “teach” foundational models about things only you know about and use that to create a ChatGPT++ experience for your users!

But what context do you provide to the model? If you have a library of information, how do you know what’s relevant to a given question? That is what embeddings are for. OpenAI embeddings are a mathematical representation of the semantic meaning of a piece of text that allows for similarity search.

With this representation, if you get a user question and calculate its embedding, you can use a similarity search against data embeddings in your library to find the most relevant information. But that requires having an embedding representation of your library.  

This post is a guide to creating, storing, and querying OpenAI vector embeddings using pgvector, the extension that turns PostgreSQL into a vector database.

Using pgvector for a PostgreSQL Vector Database

Pgvector is an open-source extension for PostgreSQL that enables storing and searching over machine learning-generated embeddings. It provides different capabilities that let users identify exact and approximate nearest neighbors. Pgvector is designed to work seamlessly with other PostgreSQL features, including indexing and querying.

Now we’re ready to start building our chatbot!

Pre-Requisites and Configuration for PostgreSQL Vector Databases

  • Install Python.
  • Install and configure a Python virtual environment. We recommend Pyenv.
  • Install the requirements for this notebook using the following command:
pip install -r requirements.txt

Import all the packages we will be using:

import openai
import os
import pandas as pd
import numpy as np
import json
import tiktoken
import psycopg2
import ast
import pgvector
import math
from psycopg2.extras import execute_values
from pgvector.psycopg2 import register_vector

You’ll need to sign up for an OpenAI Developer Account and create an OpenAI API Key – we recommend getting a paid account to avoid rate limiting and settting a spending cap so that you avoid any surprises with bills.

Once you have an OpenAI API key, it’s a best practice to store it as an environment variable and then have your Python program read it.

#First, run export OPENAI_API_KEY=sk-YOUR_OPENAI_API_KEY...

# Get openAI api key by reading local .env file
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) 
openai.api_key  = os.environ['OPENAI_API_KEY'] 

Part 1: Create Embeddings for Your PostgreSQL Vector Database

Embeddings measure how related text strings are. First, we'll create embeddings using the OpenAI API on some text we want the LLM to answer questions on.

In this example, we'll use content from the Timescale blog, specifically from the Developer Q&A section, which features posts by Timescale users talking about their real-world use cases.

You can replace this blog data with any text you want to embed, such as your own company blog, developer documentation, internal knowledge base, or any other information you’d like to have a “ChatGPT-like” experience over.

# Load your CSV file into a pandas DataFrame
df = pd.read_csv('blog_posts_data.csv')

The output looks like this:





How to Build a Weather Station With Elixir, Ne...

This is an installment of our “Community Membe...


CloudQuery on Using PostgreSQL for Cloud Asset...

This is an installment of our “Community Membe...


How a Data Scientist Is Building a Time-Series...

This is an installment of our “Community Membe...


How Conserv Safeguards History: Building an En...

This is an installment of our “Community Membe...


How Messari Uses Data to Open the Cryptoeconom...

This is an installment of our “Community Membe...

1.1 Calculate the cost of embedding data

It's usually a good idea to calculate how much creating embeddings for your selected content will cost. We provide a number of helper functions to calculate a cost estimate before creating the embeddings to help us avoid surprises.

For OpenAI, you are charged on a per-token basis for embeddings created. The total cost will be less than $0.01 for the blog posts we want to embed, thanks to OpenAI’s recent announcement of a 75 % cost reduction in their most popular embedding model, text-embedding-ada-002.

What is a token? Tokens are common sequences of characters found in text. Roughly speaking, a token is three-quarters (¾) of a word. Large language models, like GPT-3 and GPT-4 made by OpenAI, are trained to understand the statistical relationships between tokens and predict the next token in a sequence. Learn more about tokens with OpenAI’s Tokenizer tool.

# Helper functions to help us create the embeddings

# Helper func: calculate number of tokens
def num_tokens_from_string(string: str, encoding_name = "cl100k_base") -> int:
    if not string:
        return 0
    # Returns the number of tokens in a text string
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

# Helper function: calculate length of essay
def get_essay_length(essay):
    word_list = essay.split()
    num_words = len(word_list)
    return num_words

# Helper function: calculate cost of embedding num_tokens
# Assumes we're using the text-embedding-ada-002 model
# See
def get_embedding_cost(num_tokens):
    return num_tokens/1000*0.0001

# Helper function: calculate total cost of embedding all content in the dataframe
def get_total_embeddings_cost():
    total_tokens = 0
    for i in range(len(df.index)):
        text = df['content'][i]
        token_len = num_tokens_from_string(text)
        total_tokens = total_tokens + token_len
    total_cost = get_embedding_cost(total_tokens)
    return total_cost

# quick check on total token amount for price estimation
total_cost = get_total_embeddings_cost()
print("estimated price to embed this content = $" + str(total_cost))

1.2 Create smaller chunks of content

The OpenAI API has a limit to the maximum number of tokens it can create an embedding for in a single request: 8,191 to be specific.

OpenAI embeddings

To get around this limit, we'll break up our text into smaller chunks. Generally, it's a best practice to “chunk” the documents you want to create embeddings into groups of a fixed token size.

The precise number of tokens to include in a chunk depends on your use case and your model’s context window—the number of input tokens it can handle in a prompt.

For our purposes, we'll aim for chunks of around 512 tokens each. Chunking text up is a complex topic worthy of its own blog post. We’ll illustrate a simple method we found to work well below. If you want to read about other approaches, we recommend this blog post and this section of the LangChain docs.

Note: If you prefer to skip this step, you can use the provided file: blog_data_and_embeddings.csv, which contains the data and embeddings that you'll generate in this step.

The code below creates a new list of our blog content while retaining the metadata associated with the text, such as the blog title and URL that the text is associated with.

# Create new list with small content chunks to not hit max token limits
# Note: the maximum number of tokens for a single request is 8191

# list for chunked content and embeddings
new_list = []
# Split up the text into token sizes of around 512 tokens
for i in range(len(df.index)):
    text = df['content'][i]
    token_len = num_tokens_from_string(text)
    if token_len <= 512:
        new_list.append([df['title'][i], df['content'][i], df['url'][i], token_len])
        # add content to the new list in chunks
        start = 0
        ideal_token_size = 512
        # 1 token ~ 3/4 of a word
        ideal_size = int(ideal_token_size // (4/3))
        end = ideal_size
        #split text by spaces into words
        words = text.split()

        #remove empty spaces
        words = [x for x in words if x != ' ']

        total_words = len(words)
        #calculate iterations
        chunks = total_words // ideal_size
        if total_words % ideal_size != 0:
            chunks += 1
        new_content = []
        for j in range(chunks):
            if end > total_words:
                end = total_words
            new_content = words[start:end]
            new_content_string = ' '.join(new_content)
            new_content_token_len = num_tokens_from_string(new_content_string)
            if new_content_token_len > 0:
                new_list.append([df['title'][i], new_content_string, df['url'][i], new_content_token_len])
            start += ideal_size
            end += ideal_size

Now that our text is chunked better, we can create embeddings for each chunk of text using the OpenAI API.

We’ll use this helper function to create embeddings for a piece of text:

# Helper function: get embeddings for a text
def get_embeddings(text):
   response = openai.Embedding.create(
       input = text.replace("\n"," ")
   embedding = response['data'][0]['embedding']
   return embedding

And then create embeddings for each chunk of content:

# Create embeddings for each piece of content
for i in range(len(new_list)):
   text = new_list[i][1]
   embedding = get_embeddings(text)

# Create a new dataframe from the list
df_new = pd.DataFrame(new_list, columns=['title', 'content', 'url', 'tokens', 'embeddings'])

The new data frame should look like this:

Title Content URL Tokens Embeddings
0 How to Build a Weather Station With Elixir, Ne... This is an installment of our “Community Membe... 501 [0.021440856158733368, 0.02200360782444477, -0...
1 How to Build a Weather Station With Elixir, Ne... capture weather and environmental data. In all... 512 [0.016165969893336296, 0.011341351084411144, 0...
2 How to Build a Weather Station With Elixir, Ne... command in their database migration:SELECT cre... 374 [0.022517921403050423, -0.0019158280920237303,...
3 CloudQuery on Using PostgreSQL for Cloud Asset... This is an installment of our “Community Membe... 519 [0.009028822183609009, -0.005185891408473253, ...
4 CloudQuery on Using PostgreSQL for Cloud Asset... Architecture with CloudQuery SDK- Writing plug... 511 [0.02050386555492878, 0.010169642977416515, 0....

As an optional but recommended step, you can save the original blog content along with associated embeddings in a CSV file for reference later on so that you don't have to recreate embeddings if you want to reference it in another project.

# Save the dataframe with embeddings as a CSV file
df_new.to_csv('blog_data_and_embeddings.csv', index=False)

Part 2: Store Embeddings in a PostgreSQL Vector Database Using pgvector

Now that we have created embedding vectors for our blog content, the next step is to store the embedding vectors in a vector database to help us perform a fast search over many vectors.

What is a vector database?

A vector database is a database that can handle vector data. Vector databases are useful for:

  • Semantic search: Vector databases facilitate semantic search, which considers the context or meaning of search terms rather than just exact matches. They are useful for recommendation systems, content discovery, and question-answering systems.
  • Efficient similarity search: Vector databases are designed for efficient high-dimensional nearest neighbor search, a task where traditional relational databases struggle.
  • Machine learning: Vector databases store and search embeddings created by machine-learning models. This feature aids in finding items semantically similar to a given item.
  • Multimedia data handling: Vector databases also excel in working with multimedia data (images, audio, video) by converting them into high-dimensional vectors for efficient similarity search.
  • NLP and data combination: In Natural Language Processing (NLP), vector databases store high-dimensional vectors representing words, sentences, or documents. They also allow a combination of traditional SQL queries with similarity searches, accommodating both structured and unstructured data.

We’ll use PostgreSQL with the pgvector extension installed as our vector database. Pgvector extends PostgreSQL to handle vector data types and vector similarity search, like nearest neighbor search, which we’ll use to find the k most related embeddings in our database for a given user prompt.

Why use pgvector as a vector database?

Here are five reasons why PostgreSQL is a good choice for storing and handling vector data:

  • Integrated solution: By using PostgreSQL as a vector database, you keep your data in one place. This can simplify your architecture by reducing the need for multiple databases or additional services.
  • Enterprise-level robustness and operations: With a 30-year pedigree, PostgreSQL provides world-class data integrity, operations, and robustness. This includes backups, streaming replication, role-based and row-level security, and ACID compliance.
  • Full-featured SQL: PostgreSQL supports a rich set of SQL features, including joins, subqueries, window functions, and more. This allows for powerful and complex queries that can include both traditional relational data and vector data. It also integrates with a plethora of existing data science and data analysis tools.
  • Scalability and performance: PostgreSQL is known for its robustness and ability to handle large datasets. Using it as a vector database allows you to leverage these characteristics for vector data as well.
  • Open source: PostgreSQL is open source, which means it's free to download and use, and you can modify it to suit your needs. It also means that it benefits from the collective input of developers all over the world, which often results in high-quality, secure, and up-to-date software. PostgreSQL has a large and active community, so help is readily available. There are many resources, such as documentation, tutorials, forums, and more, to help you troubleshoot and optimize your PostgreSQL database.

2.1 Create a PostgreSQL database and install pgvector

First, we’ll create a PostgreSQL database. You can create a cloud PostgreSQL database in minutes for free on Timescale or use a local PostgreSQL database for this step.

Once you’ve created your PostgreSQL database, export your connection string as an environment variable, and just like the OpenAI API key, we’ll read it into our Python program from the environment file:

# Timescale database connection string
# Found under "Service URL" of the credential cheat-sheet or "Connection Info" in the Timescale console
# In terminal, run: export TIMESCALE_CONNECTION_STRING=postgres://<fill in here>

connection_string  = os.environ['TIMESCALE_CONNECTION_STRING']

We then connect to our database using the popular psycopg2 python library and install the pgvector extension as follows:

# Connect to PostgreSQL database in Timescale using connection string
conn = psycopg2.connect(connection_string)
cur = conn.cursor()

#install pgvector
cur.execute("CREATE EXTENSION IF NOT EXISTS vector");

2.2 Connect to and configure your vector database

Once we’ve installed pgvector, we use the register_vector() command to register the vector type with our connection:

# Register the vector type with psycopg2

Once we’ve connected to the database, let’s create a table that we’ll use to store embeddings along with metadata. Our table will look as follows:







  • Id represents the unique ID of each vector embedding in the table.
  • title is the blog title from which the content associated with the embedding is taken.
  • url is the blog URL from which the content associated with the embedding is taken.
  • content is the actual blog content associated with the embedding.
  • tokens is the number of tokens the embedding represents.
  • embedding is the vector representation of the content.

One advantage of using PostgreSQL as a vector database is that you can easily store metadata and embedding vectors in the same database, which is helpful for supplying the user-relevant information related to the response they receive, like links to read more or specific parts of a blog post that are relevant to them.

# Create table to store embeddings and metadata
table_create_command = """
CREATE TABLE embeddings (
            id bigserial primary key, 
            title text,
            url text,
            content text,
            tokens integer,
            embedding vector(1536)


2.3 Ingest and store vector data into PostgreSQL using pgvector

Now that we’ve created the database and created the table to house the embeddings and metadata, the final step is to insert the embedding vectors into the database.

For this step, it’s a best practice to batch insert the embeddings rather than insert them one by one.

#Batch insert embeddings and metadata from dataframe into PostgreSQL database
cur = conn.cursor()
# Prepare the list of tuples to insert
data_list = [(row['title'], row['url'], row['content'], int(row['tokens']), np.array(row['embeddings'])) for index, row in df_new.iterrows()]
# Use execute_values to perform batch insertion
execute_values(cur, "INSERT INTO embeddings (title, url, content, tokens, embedding) VALUES %s", data_list)
# Commit after we insert all embeddings

Let’s sanity check by running some simple queries against our newly inserted data:

cur.execute("SELECT COUNT(*) as cnt FROM embeddings;")
num_records = cur.fetchone()[0]
print("Number of vector records in table: ", num_records,"\n")
# Correct output should be 129

# print the first record in the table, for sanity-checking
cur.execute("SELECT * FROM embeddings LIMIT 1;")
records = cur.fetchall()
print("First record in table: ", records)

2.4 Index your data for faster retrieval

In this example, we only have 129 embedding vectors, so searching through all of them is blazingly fast. But for larger datasets, you need to create indexes to speed up searching for similar embeddings, so we include the code to build the index for illustrative purposes.

Pgvector supports the ivfflat index type to provide for speed up of approximate nearest neighbor (ANN) searches (similarity search indexes for high-dimensionality data is very often approximate).

You always want to build this index after you have inserted the data, as the index needs to discover clusters in your data to be effective, and it does this only when first building the index.

The index has a tunable parameter of the number of lists to use, and the code below shows the best practice for tuning this parameter. You also need to specify the distance measure used for indexing and ensure it matches the measure you use in your queries. In our case, we use the Cosine distance for querying below, and so we create our index with vector_cosine_ops.

# Create an index on the data for faster retrieval

#calculate the index parameters according to best practices
num_lists = num_records / 1000
if num_lists < 10:
   num_lists = 10
if num_records > 1000000:
   num_lists = math.sqrt(num_records)

#use the cosine distance measure, which is what we'll later use for querying
cur.execute(f'CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = {num_lists});')

Part 3: Nearest Neighbor Search Using pgvector

Given a user question, we’ll perform the following steps to use information stored in the vector database to answer their question using Retrieval Augmented Generation:

  1. Create an embedding vector for the user question.
  2. Use pgvector to perform a vector similarity search and retrieve the k nearest neighbors to the question embedding from our embedding vectors representing the blog content. In our example, we’ll use k=3, finding the three most similar embedding vectors and associated content.
  3. Supply the content retrieved from the database as additional context to the model and ask it to perform a completion task to answer the user question.

3.1 Define a question you want to answer

First, we’ll define a sample question that a user might want to answer about the blog posts stored in the database.

# Question about Timescale we want the model to answer
input = "How is Timescale used in IoT?"

Since Timescale is popular for IoT sensor data, a user might want to learn specifics about how they can leverage it for that use case.

3.2 Find the most relevant content in the database

Here’s the function we use to find the three nearest neighbors to the user question. Note it uses pgvector’s <=> operator, which finds the Cosine distance (also known as Cosine similarity) between two embedding vectors.

# Helper function: Get top 3 most similar documents from the database
def get_top3_similar_docs(query_embedding, conn):
    embedding_array = np.array(query_embedding)
    # Register pgvector extension
    cur = conn.cursor()
    # Get the top 3 most similar documents using the KNN <=> operator
    cur.execute("SELECT content FROM embeddings ORDER BY embedding <=> %s LIMIT 3", (embedding_array,))
    top3_docs = cur.fetchall()
    return top3_docs

3.3 Define helper functions to query OpenAI

We supply helper functions to create an embedding for the user question and to get a completion response from an OpenAI model. We use GPT-3.5, but you can use GPT-4 or any other model from OpenAI.

We also specify a number of parameters, such as limits of the maximum number of tokens in the model response and model temperature, which controls the randomness of the model, which you can modify to your liking:

# Helper function: get text completion from OpenAI API
# Note we're using the latest gpt-3.5-turbo-0613 model
def get_completion_from_messages(messages, model="gpt-3.5-turbo-0613", temperature=0, max_tokens=1000):
    response = openai.ChatCompletion.create(
    return response.choices[0].message["content"]

# Helper function: get embeddings for a text
def get_embeddings(text):
    response = openai.Embedding.create(
        input = text.replace("\n"," ")
    embedding = response['data'][0]['embedding']
    return embedding

3.3 Putting it all together

We’ll define a function to process the user input by retrieving the most similar documents from our database and passing the user input, along with the relevant retrieved context to the OpenAI model to provide a completion response to.

Note that we modify the system prompt as well in order to influence the tone of the model’s response.

We pass to the model the content associated with the three most similar embeddings to the user input using the assistant role. You can also append the additional context to the user message.

# Function to process input with retrieval of most similar documents from the database
def process_input_with_retrieval(user_input):
    delimiter = "```"

    #Step 1: Get documents related to the user input from database
    related_docs = get_top3_similar_docs(get_embeddings(user_input), conn)

    # Step 2: Get completion from OpenAI API
    # Set system message to help set appropriate tone and context for model
    system_message = f"""
    You are a friendly chatbot. \
    You can answer questions about timescaledb, its features and its use cases. \
    You respond in a concise, technically credible tone. \

    # Prepare messages to pass to model
    # We use a delimiter to help the model understand the where the user_input starts and ends
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": f"{delimiter}{user_input}{delimiter}"},
        {"role": "assistant", "content": f"Relevant Timescale case studies information: \n {related_docs[0][0]} \n {related_docs[1][0]} {related_docs[2][0]}"}   

    final_response = get_completion_from_messages(messages)
    return final_response

Let’s see an example of the model’s output to our original input question:

input = "How is Timescale used in IoT?"
response = process_input_with_retrieval(input)

Here’s the model’s response:

User input: How is Timescale used in IoT?

Model response:

TimescaleDB is commonly used in IoT (Internet of Things) applications for storing and analyzing time-series data generated by IoT devices. IoT devices generate a large volume of data over time, such as sensor readings, device status updates, and alarm information. TimescaleDB provides a scalable and efficient solution for storing and querying this time-series data.

With TimescaleDB, IoT applications can benefit from features such as automatic data partitioning, compression, and retention policies. These features allow for efficient storage and retrieval of time-series data, even as the data volume grows. TimescaleDB also supports SQL queries, making it easy to perform complex analytics and aggregations on the time-series data.

In IoT applications, TimescaleDB can be used to monitor and analyze various aspects, such as environmental conditions (temperature, humidity), energy consumption, equipment performance, and predictive maintenance. It enables real-time monitoring, anomaly detection, and trend analysis, helping businesses make data-driven decisions and optimize their operations.

Overall, TimescaleDB provides a reliable and scalable database solution for handling the high-volume time-series data generated by IoT devices, enabling businesses to unlock valuable insights and improve their IoT applications.

We can also ask the model questions about specific documents in the database, in this case about specific Timescale users who have spoken about their experience in a blog post:

input_2 = "Tell me about Edeva and Hopara. How do they use Timescale?"
response_2 = process_input_with_retrieval(input_2)

Here’s the model output for the input question above; notice how it uses specific details from the blog posts.

User input: Tell me about Edeva and Hopara. How do they use Timescale?

Model response: 

Edeva and Hopara are two companies that utilize TimescaleDB for their data storage and visualization needs.

Edeva uses TimescaleDB as the main database in their smart city system. Their clients can control their IoT devices and view captured data, gaining insights from trends and historical data. They leverage TimescaleDB's continuous aggregations feature to speed up queries and improve dashboard performance.

Hopara, on the other hand, provides a sophisticated visualization system that allows users to derive insights from various types of data. They use TimescaleDB to store real-time vibration data from sensor-tagged machines. Hopara's visualization system, powered by TimescaleDB, enables users to drill down into the data and identify vibration issues.

Both Edeva and Hopara benefit from TimescaleDB's time-series functionality and its ability to handle large amounts of data efficiently.


Retrieval Augmented Generation (RAG) is a powerful method of building applications with LLMs that enable you to teach foundation models about things it was not originally trained on—like private documents or recently published information.

We have now seen the basics of creating a chatbot to answer questions about a blog. This project is an example of how to create, store, and perform similarity search on OpenAI embeddings. We used PostgreSQL and pgvector as our vector database to store and query the embeddings.

Jupyter Notebook and Code: You can find all the code used in this tutorial in a Jupyter Notebook, as well as sample content and embeddings on the Timescale GitHub: timescale/vector-cookbook.

And if you’re looking for a production PostgreSQL database for your vector workloads, try Timescale. It’s free for 30 days, no credit card required.

Further reading

If you want to learn how to do both vector search and time-based filtering in PostgreSQL, see how you can refine vector search queries in this blog post.

Ingest and query in milliseconds, even at terabyte scale.
This post was a collaboration between
18 min read

Related posts