Pgai: Giving PostgreSQL Developers AI Engineering Superpowers

Pgai: Giving PostgreSQL Developers AI Engineering Superpowers

Introducing pgai, a new open-source extension that brings embedding and generation models closer to the database, making PostgreSQL an even better database for AI applications.

There’s an industry-wide push in companies to explore how they can leverage AI and large language models (LLMs) in their products and processes. And with that push, many professional developers find themselves in the category of “experienced developer but a beginner in AI.” Thanks to their engineering experience and expertise, these developers attempt to navigate the uncharted waters of applying AI and LLMs to build products, but without a traditional machine learning background.

In his seminal blog post “The Rise of the AI Engineer,” Shawn Wang (better known as Swyx) talks about this class of developers as “AI Engineers”—developers who specialize in applying AI models, tools, and APIs to build AI products. AI engineers stand in contrast to traditional AI practitioners who are primarily researchers and scientists.

Illustrative spectrum of technical roles in AI. Image taken from The Rise of the AI Engineer by Swyx and Alessio.
Illustrative spectrum of technical roles in AI. Image taken from The Rise of the AI Engineer by Swyx and Alessio.

Case in point: we spoke to a Timescale customer who was tasked with building a RAG application at their company. This engineer had decades of database and C experience but complained to us about fumbling around trying to learn Python to quickly build a proof-of-concept (PoC) to validate the core idea. “I wish I could do more of this in SQL” was a common refrain we heard from this and many other engineers we spoke to, ultimately sparking the idea for pgai.

Pgai is a PostgreSQL extension that brings more AI workflows to PostgreSQL, like embedding creation and model completion. Licensed under the Open Source PostgreSQL License, pgai further enriches the PostgreSQL AI ecosystem, making it easier for developers to build search and retrieval-augmented generation (RAG) applications. 

We built the pgai extension to give application developers familiar with PostgreSQL more tools to simplify their workflows and face fewer bumps in the road as they travel from idea to PoC to building production-ready AI applications. Simply put, we built pgai to help make more PostgreSQL developers AI engineers. 

Pgai helps make more application engineers who are familiar with databases like PostgreSQL become AI Engineers and build AI applications. Image modified from The Rise of the AI Engineer by Swyx and Alessio.
Pgai helps make more application engineers who are familiar with databases like PostgreSQL become AI Engineers and build AI applications. Image inspired by The Rise of the AI Engineer by Swyx and Alessio.

The initial release of pgai supports creating OpenAI embeddings and getting OpenAI chat completions from models like GPT4o directly from your PostgreSQL database. Pgai complements pgvector by storing embeddings in the pgvector data type and using Python and PL/Python to interact with model APIs from within a PostgreSQL database.

Next, we plan to add support for more models like Claude and Cohere, as well as open-source embedding and generation models via Ollama and HuggingFace endpoints. We welcome community contributions for models and functionality you want to see. To get involved, file an issue in the pgai GitHub repository or contribute a feature today.

We gave a sneak peek of pgai to a select group of developers who are building AI applications with PostgreSQL. Here’s what Web Begole, CTO of Market Reader, a company using PostgreSQL to build an AI-enabled financial information platform, had to say:

“Pgai is incredibly exciting for building AI applications with PostgreSQL. Having embedding functions directly within the database is a huge bonus. Previously, updating our saved embeddings was a tedious task, but now, with everything integrated, it promises to be much simpler and more efficient. This will save us a significant amount of time and effort.”

💡
Editor’s Note: To further complement pgvector in helping developers use Postgres to build AI applications, we built pgvectorscale, which improves pgvector’s scalability and performance making it as faster than Pinecone. Read more here.

How Pgai Simplifies Workflows for RAG Applications on PostgreSQL

To give you a taste of what’s possible with the pgai extension, let’s take a look at two common tasks developers building RAG applications face and how pgai can help streamline them:

  1. Creating embeddings for data
  2. LLM reasoning over data, like summarization, classification, or data enrichment

We’ll use the example of git commit messages as the text we’ll embed and reason over. The code snippet below shows installing the pgai extension and ingesting a CSV file of git commits into a table named commit_history. We’ll reference this table in the examples below:

-- install pgai (also installs pgvector via cascade)
create extension if not exists ai cascade;

-- a table for storing git commit history
create table commit_history
( id int not null primary key
, author text
, "date" timestamptz
, "commit" text
, summary text
, detail text
);

-- use psql's copy metacommand to load the csv into the table
\copy commit_history from 'commit_history.csv' with (format csv)

Pgai Example: Creating Embeddings From Text Data in PostgreSQL

Pgai simplifies creating embeddings from existing relational data in PostgreSQL. Let’s look at an example of using the pgai extension to generate an embedding from a table of git commits.

The code snippet below generates embeddings for each row in the commit_history table using OpenAI’s text-embedding-3-small model. The embeddings are stored in a new table called commit_history_embed. The code selects each row from the commit_history table, creates a single text string representation of the commit, and then embeds this text. The resulting embeddings are then inserted into the commit_history_embed table.

-- we want to embed each row of commit history and put the embedding in this table
create table commit_history_embed
( id int not null primary key
, embedding vector(1536) -- the vector type comes from the pgvector extension
);

-- select from the first table, embed the content, and insert in the second table
insert into commit_history_embed (id, embedding)
select
  id
, openai_embed
  ( 'text-embedding-3-small'
    -- create a single text string representation of the commit
  , format('author: %s date: %s commit: %s summary: %s detail: %s', author, "date", "commit", summary, detail)
  ) as embedding
from commit_history
;

In this example, the embeddings are inserted into a new table, but they can also be stored in a new column in the table where the text data is housed.

Pgai Example: Classification and Summarization on Text Data in PostgreSQL

Pgai enables classification, summarization, and data enrichment tasks on existing relational data. Let’s look at an example of summarization where we use pgai to output release notes in markdown format from a table of git commit messages. 

In the code snippet below, we use pgai to perform a summarization task on the rows in thecommit_history table. The code uses pgai to call OpenAI’s GPT4o model to perform the summarization task, which can be stored in another column or output as a response. We show the generated release notes in the image below.

-- summarize and categorize git commits to produce a release notes document
select openai_chat_complete
( 'gpt-4o'
, jsonb_build_array
  ( jsonb_build_object
    ( 'role', 'system'
    , 'content', 'You are a software release engineer who summarizes git commits to produce release notes.'
    )
  , jsonb_build_object
    ( 'role', 'user'
    , 'content'
    , -- build up a list of the commit details to append to the prompt
      concat
      ( E'Summarize the following list of commits from the timescaledb git repo from August 2023 in a release notes document in markdown format.\n\n'
      , string_agg(x.commit_desc, E'\n\n')
      )
    )
  )
)->'choices'->0->'message'->>'content'
from
(
    -- convert each to a text format
    select format
    ( E'%s %s\n\tcommit: %s\n\tauthor: %s\n\tdate: %s\n\tdetail: %s'
    , row_number() over (order by "date")
    , summary
    , "commit"
    , author
    , "date"
    , detail
    ) as commit_desc
    from commit_history
    -- just look at commits from August 2023
    where date_trunc('month', "date") = '2023-08-01 00:00:00+00'::timestamptz
    order by "date"
) x
;

Here’s the resulting release notes generated by the above query:

Markdown of release notes created by GPT4o via pgai as a result of summarizing commit messages housed in a PostgreSQL table. This is just a taste of the summarization, categorization, and advanced text processing available to you right inside PostgreSQL via pgai.
Markdown of release notes created by GPT4o via pgai as a result of summarizing commit messages housed in a PostgreSQL table. This is just a taste of the summarization, categorization, and advanced text processing available to you right inside PostgreSQL via pgai.

Use Pgai Today

Pgai is open source under the PostgreSQL License and is available for you to use in your AI projects today. You can find installation instructions on the pgai GitHub repository. You can also access pgai on any database service on Timescale’s cloud PostgreSQL platform.

How to get involved

Pgai is an effort to enrich the PostgreSQL ecosystem for AI. If you’d like to help, here’s how you can get involved:

  • Join the Postgres for AI Discord: Join a community of developers building AI applications with PostgreSQL in the pgai Discord server. Share what you’re working on, and help or get helped by a community of peers.
  • Share the news with your friends and colleagues: Share our posts announcing pgai on X/Twitter, LinkedIn, and Threads. We promise to RT back.
  • Submit issues and feature requests: We encourage you to submit issues and feature requests for functionality you’d like to see, bugs you find, and suggestions you think would improve pgai. Head over to the pgai GitHub repo to share your ideas.
  • Make a contribution: We welcome community contributions for pgai. Pgai is written in Python and PL/Python. Let us know which models you want to see supported, particularly for open-source embedding and generation models. See the pgai GitHub for instructions to contribute.
  • Offer the pgai extension on your PostgreSQL cloud: Pgai is an open-source project under the PostgreSQL License. We encourage you to offer pgai on your managed PostgreSQL database-as-a-service platform and can even help you spread the word. Get in touch via our Contact Us form and mention pgai to discuss further.

Let's make Postgres a better database for AI, together!

This post was a collaboration between
7 min read
AI
Contributors

Related posts