Category: All posts
Jan 30, 2025
Posted by
Team Timescale
When dealing with large volumes of text, generating meaningful summaries that condense data into actionable insights without losing essential context is critical. Large language models (LLMs) have transformed how we approach data summarization, offering powerful capabilities in understanding and summarizing text with human-like fluency. Top LLMs, such as Claude 3.5 Sonnet by Anthropic, excel at reasoning over complex language patterns, extracting key information, and generating coherent summaries while preserving the contextual depth of the original data.
In this guide, we’ll demonstrate how to use the Anthropic Claude 3.5 Sonnet model with PostgreSQL, powered by the open-source extensions pgvector and pgai, to summarize data upon insertion automatically. This approach eliminates the need for summarization as a separate step, ensuring that deriving insights from your data is neither complex nor cumbersome.
Data summarization turns large amounts of information into concise, actionable insights. This can boost decision-making and productivity in many critical areas in enterprise settings. For example, you can use summarization tools to identify themes and sentiments in customer feedback quickly. In academic research, summaries highlight key findings from extensive literature. Similarly, by summarizing operational logs, you can detect and resolve system issues efficiently. By turning complex data into clear summaries, organizations can navigate and act on information with speed and precision.
As one of the most popular relational databases, PostgreSQL offers an excellent platform for integrating automated summarization directly into your database workflows. This integration eliminates the need for external processing tools, enabling seamless and scalable summarization within your existing data infrastructure.
To automate data summarization in PostgreSQL, we will create a function that generates summaries from SQL data using the pgai
utility, specifically ai.anthropic_generate
. Along with this function, we will set up a trigger that executes whenever new data is inserted into a PostgreSQL table.
Let’s take a quick look at our two major building blocks.
In this guide, we will use the Anthropic Claude 3.5 Sonnet model. Claude 3.5 Sonnet is a state-of-the-art multimodal AI model that excels in tasks such as language understanding, coding, and visual reasoning. The model features a 200,000-token context window, which allows for comprehensive understanding and generation of large amounts of text.
To enable AI features in PostgreSQL, we will use the pgai extension. Pgai integrates AI capabilities directly into the database and enables seamless execution of AI tasks such as embedding generation, text summarization, and natural language queries. It acts as a bridge between PostgreSQL and advanced AI models.
Before we begin summarizing data in PostgreSQL with our tutorial, we need a few things in place.
First, you’ll need a working installation of PostgreSQL with the necessary extensions. You can install them manually or use a pre-built Docker container. Alternatively, you can opt for Timescale Cloud, which provides a free PostgreSQL instance with pgai and pgvector pre-installed.
We’ll use Timescale Cloud here. When launching the service, make sure you select the AI and Vector capability.
Once you have launched the PostgreSQL instance, note the database connection string, username, and password.
PopSQL is a collaborative SQL editor that allows data teams to write queries, visualize data, and share results in real time. It is a powerful alternative to SQL
and is available as a browser-based tool and a downloadable desktop client.
Sign up and set up the connection to your PostgreSQL server on PopSQL.
Since we’ll be using Claude 3.5 Sonnet in this guide, you’ll need an API key from Anthropic. Head to Build with Claude to obtain your API key—you’ll need it in the next step.
We can now proceed.
You can run the following SQL queries in your PopSQL client. First, enable the pgai
and pgvector
extensions with the following commands:
CREATE EXTENSION IF NOT EXISTS ai CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
Set the Anthropic API key in a variable to use in subsequent operations:
SET ANTHROPIC_API_KEY = "YOUR-ANTHROPIC-API-KEY";
We will use the ecommerce_fashion
dataset from Hugging Face, which contains information about fashion products, including attributes such as product images, names, prices, available colors, patterns, and detailed descriptions.
To load the dataset, use the load_dataset
utility the pgai extension provides. This utility makes loading Hugging Face datasets straightforward.
SELECT ai.load_dataset(
name => 'rajuptvs/ecommerce_products_clip',
table_name => 'ecommerce_fashion'
);
We will add two more columns to the dataset. The first is a primary key column row_id
, and the second is summary
, which we’ll store in the AI-generated summary.
ALTER TABLE ecommerce_fashion
ADD COLUMN "summary" TEXT;
ALTER TABLE ecommerce_fashion
ADD COLUMN id SERIAL PRIMARY KEY;
Let’s now create a function named generate_summary
in PostgreSQL to automate data summarization. This function uses the pgai function ai.anthropic_generate
to call the Anthropic Claude API and generate concise, AI-driven summaries based on the information provided for each product. The generated summary will then be used to update the summary
column.
CREATE OR REPLACE FUNCTION generate_summary(row_id INT)
RETURNS VOID AS $$
DECLARE
product_record RECORD;
prompt TEXT;
response JSONB;
BEGIN
SELECT * INTO product_record
FROM ecommerce_fashion
WHERE id = row_id;
prompt := 'Summarize the following product details in 2 to 3 lines exclude info that is not provided: ' ||
'Product Name: ' || COALESCE(product_record."Product_name", 'N/A') || ', ' ||
'Price: ' || COALESCE(product_record."Price", 'N/A') || ', ' ||
'Colors: ' || COALESCE(product_record."colors", 'N/A') || ', ' ||
'Pattern: ' || COALESCE(product_record."Pattern", 'N/A') || ', ' ||
'Description: ' || COALESCE(product_record."Description", 'N/A') || ', ' ||
'Other Details: ' || COALESCE(product_record."Other Details", 'N/A') || ', ' ||
'Clipinfo: ' || COALESCE(product_record."Clipinfo", 'N/A') || '.';
SELECT ai.anthropic_generate(
'claude-3-5-sonnet-20240620',
jsonb_build_array(
jsonb_build_object(
'role', 'user',
'content', prompt
)
)
) INTO response;
UPDATE ecommerce_fashion
SET "summary" = response->'content'->0->>'text'
WHERE id = row_id;
END;
$$ LANGUAGE plpgsql;
In this function, we create a prompt by combining data from the fields in the SQL table and then use the ai.anthropic_generate
function to generate the summary. We then use the SET
command to update the summary
column with the generated summary. Depending on the structure of your SQL data, you can modify the function to create an AI-generated summary tailored to your specific data.
We will now create a PostgreSQL trigger, trigger_generate_summary
. This trigger function will call the generate_summary
function that we created above.
CREATE OR REPLACE FUNCTION trigger_generate_summary()
RETURNS TRIGGER AS $$
BEGIN
PERFORM generate_summary(NEW.id);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Now, we will set up a trigger to execute after an insert operation on the ecommerce_fashion
table. Here’s how you can do it:
CREATE TRIGGER after_insert_generate_summary
AFTER INSERT ON ecommerce_fashion
FOR EACH ROW
EXECUTE FUNCTION trigger_generate_summary();
Let’s insert some test data into the ecommerce_fashion
table and see if the summary is being generated automatically.
INSERT INTO ecommerce_fashion ("Product_name", "Price", "colors", "Pattern", "Description", "Other Details", "Clipinfo")
VALUES
(
'Women Drop Waist Red Dress', -- Product_name
'₹399', -- Price
'Red', -- colors
'Solid', -- Pattern
'Imported Knitted (Strachable) fabric Comfortable to Wear. Occasions: Party, Holiday, Tour, Trip, Daliy Wear, and Special Occasions, Like Some Parties etc. We do believe Fashion2wear does more than a supplier in women clothing. It also contributes its efforts to healthy and comfortable life-style.Fashion2wear is simple but not common place. join us now, simply bringing the latest fashion worldwide to your doorstep! Very nice quality dress, worth every dime. Fits like a glove with soft material to keep you comfortable. Great dress for either day or night depending on how you dress it up! This is a Flared Knee Long Dress Ideal for any Body Type, round neck, knee length, Full Sleeves, stretchable, Regular fit and Flare plain casual dress, suitable to spring summer autumn and winter, suitable to club, party,beach, date and home due to people have different taste in clothing as you know, someone like a slim(snug) fit, someone prefer a little loose (relax) fit, so please choose size according to size chart.', -- Description
'unknown', -- Other Details
'Red Solid Women Drop Waist Red Dress' -- Clipinfo
);
If you query the row, you’ll notice that the trigger has automatically updated the summary
column.
select "summary" from ecommerce_fashion where id=1914;
Here is the output:
Product: Women Drop Waist Red Dress +
Price: ₹399 +
Summary: This knee-length red dress features a solid pattern, full sleeves, and a round neck. Made from imported knitted fabric, it's comfortable and suitable for various occasions including parties and daily wear. The dress is flared and stretchable, suitable for different body types and seasons.
(1 row)
That’s it! We now have a system that automatically generates summaries from PostgreSQL's SQL data. We didn’t need any external tools, thanks to pgai
, which allows us to use LLMs and a wide range of models directly within the PostgreSQL database.
In this guide, we’ve demonstrated how to leverage Anthropic Claude, pgvector, and pgai to automate data summarization directly within PostgreSQL. By integrating cutting-edge AI capabilities into your database, you can streamline workflows, eliminate the need for external processing tools, and derive actionable insights from your data in real time.
Pgai directly supports Anthropic, OpenAI, and Cohere, and you can also use it with Ollama to leverage any of the top models supported by Ollama.
To get started, sign up for Timescale Cloud, which comes pre-configured with pgai and pgvector, or use the extensions in your PostgreSQL installation to bring AI features to your database.