Building Blockchain Apps on Postgres

Building Blockchain Apps on Postgres

The time dimension of the database is an essential element of every blockchain. Without it, blockchains could not operate: transactions and blocks could not be added. Data associated with blockchains is inherently time-series data.

At Timescale, we're always looking for ways to showcase the expanding reach of time-series data. Using our PostgreSQL and TimescaleDB superpowers, we are able to generate insight into the data behind everything in the world. With time-series data, we’re not only able to see how things are but also how things have changed and are in the process of changing. Measuring change is an essential component of better understanding the world.

In this post, we’ll look at tips and strategies for building blockchain apps on Postgres. We’ll also look at the ways that Postgres and TimescaleDB support each other to create more dynamic apps.

Blockchain Data Is Time-Series Data

Blockchain data combines many different types of data—including time-series data. You can get a lot of value from combining these datasets. 

In a past blog post, I did exactly that: I analyzed the Bitcoin (BTC) blockchain with PostgreSQL, looking at the transaction volume and BTC USD rate to look into how they correlate and why.

A line graph showing the transaction volume and BTC USD rate

I think this kind of analysis is really interesting because these are two different datasets: on one side, you have the transaction volume, which you might be able to get directly from the blockchain by ingesting the on-chain data, and then you aggregate all the transactions by daily or weekly aggregates; on the other side, you combine this dataset with cryptocurrency data, which has the BTC USD rate. And again, this is all happening within the PostgreSQL database.

Blockchain Apps Need Dynamic Databases

Some time-series databases won’t allow you to store different types of datasets within the database, so you end up with time-series data in one database and then your relational data in another database. To analyze all this data, you have to sort of combine them on an application layer, and that, as we all know, can be a headache.

With TimescaleDB, because we’re PostgreSQL, you can store all sorts of data in the database: we are a general database, but we also did a lot of optimizations for time-series, a.k.a., blockchain data.

Another raw data example that can be used as a dataset in your data application database is this one.

A raw data example of bitcoin transaction data
Bitcoin transaction data in raw form

This is data from the BTC blockchain, on-chain data. As you can see, we have a time column (that will allow you to join this dataset with others), block ID, hash size, block size, etc.

Looking at both these examples, you can notice a few things:

  • They are considered blockchain data because they contain information about the blockchain.
  • Both datasets are timestamped—they’re time-series data.
  • Since blockchains are technically time-series databases, they are append-only: you can add new records or blocks to the blockchain, but you can never modify historical blocks.
  • Big blockchain data is real-time data, meaning new data is being processed on the blockchain as you read these words.

So, if you’re building a real-time data application, you must ensure that your database can handle this type of data, which isn’t necessarily easy with analytical databases. This is one of the main reasons why users try TimescaleDB: it’s a relational database but performs amazingly well with analytics and historical data.

Finally, blockchain data is connected to other datasets. Even if you work for a business, you can pull internal data and combine it with blockchain public data.

Essential database features

Now let’s look at the features a database must have to seamlessly handle blockchain data. 

  • Scalable for time-series data. As we noted, blockchain data is time-series data. You need a database that can handle it.
  • Historical and real-time capabilities. Real-time queries are very different from historical queries, for example. If you want to provide real-time data on a customer-facing application, that can be very hard to do in the same database. 
  • Avoid data silos. This is incredibly important if you’re combining different datasets. If you start exploring the BTC blockchain or cryptocurrency data, you’ll probably want to combine them with another dataset after a while.

PostgreSQL Changes the Game on Blockchain Storage

To explore the world of Bitcoin transactions, you need to ensure your database infrastructure is reliable, scalable, and maintainable—especially if you’re building a data application on top of this data.

Using PostgreSQL to store blockchain data gives you a reliable foundation that can keep up with the relentless nature of blockchains—which are always up and running. 

You need a schema to store anything in a relational database like TimescaleDB. Let's see a simple yet effective schema that you can start to store Bitcoin transactions.

Database schema for Bitcoin transactions

The dataset collected for this blog post uses Blockchair’s data dumps. Those data files, just like the data on the actual blockchain, have more data fields than what this article covers. But still, this example schema provides several data fields (e.g., transaction weight, output, fee, etc.) that can get you started and help you generate insights.

Nonetheless, this schema has a detailed JSONB column that stores all additional information if you decide to analyze the data fields not covered in the blog post.

Data field descriptions:

Field

Description

time

Timestamp of the transaction

block_id

Block ID

hash

Hash ID of the transaction

size

Size of the transaction in KB

weight

Size of the transaction in weight units

is_coinbase

Miner revenue transaction

output_total

The value of the transaction in Satoshi (sat)

output_total_usd

The value of the transaction in USD

fee

Transaction fee in Satoshi (sat)

fee_usd

Transaction fee in USD

details

Additional details about the transaction (e.g., input_total, lock_time, fee_per_kwu, etc.)

Database schema:

CREATE TABLE transactions (
   time TIMESTAMPTZ,
   block_id INT,
   hash TEXT,
   size INT,
   weight INT,
   is_coinbase BOOLEAN,
   output_total BIGINT,
   output_total_usd DOUBLE PRECISION,
   fee BIGINT,
   fee_usd DOUBLE PRECISION,
   details JSONB
);
 
SELECT create_hypertable('transactions', 'time');

This schema vastly simplifies analyzing the dataset but still leaves room for future changes in case you find the data fields in details useful; they are still available through the JSONB column.

The create_hypertable function transforms the regular PostgreSQL table into a TimescaleDB hypertable to partition the table based on the timestamp column, thus optimizing the querying and ingestion of time-series data—while behaving just like a regular PostgreSQL table.

Using Indexes to Scale Your App for Blockchain

To make your database “blockchain-scale,” you need to apply some optimization strategies in PostgreSQL and TimescaleDB to enjoy a satisfying experience while maintaining and querying the data.

Using a hypertable to store transactions lays the groundwork for simple and scalable time-series storage. TimescaleDB, being a PostgreSQL database, lets you use all the PostgreSQL optimization strategies (e.g., indexes) along with TimescaleDB features (like continuous aggregates) to get the best of both tools.

In sum, you can achieve the best experience in your database if you use the things that make PostgreSQL exceptional in the relational database world and TimescaleDB exceptional in the time-series world.

Let’s start with indexes.

B-tree index

When you create a hypertable, TimescaleDB automatically adds a B-tree index on the timestamp column of the table. This already improves most of the queries where you filter by the time column.

For example, this is what you can do to find the most recent transactions from the past two weeks in which the transaction fee was higher than the transaction value:

SELECT
 time,
 hash,
 fee*0.00000001 as fee_btc,
 output_total*0.00000001 as value_btc,
 output_total_usd as value_usd
FROM transactions
WHERE "time" > NOW() - INTERVAL '2 weeks'
ORDER BY fee_btc DESC

Most expensive transactions (by fee paid in BTC)
Most expensive transactions (by fee paid in BTC)

Just by using TimescaleDB, this query performs immediately better than with regular PostgreSQL. Under the hood, TimescaleDB speeds up this query by using the index and excluding chunks that are outside the defined time period. You can also decide to add more indexes based on the kind of queries you plan to write.

For example, if you plan to build a data application that analyzes individual Bitcoin transactions, you can achieve better query performance by adding an index to the hash column.

Each transaction on the blockchain has a hash value, a 64-character long string that acts as the transaction’s unique identifier. If your application users search for a specific transaction to find out more about it, this is the column they should use to find the transaction in the table.

SELECT * FROM transactions
WHERE hash = '94fb53b99dd2cb7ccf26745c908eebd7c01cb8c19fb84819b796beb7750b0cb8'

You can speed up this query by adding a HASH INDEX to the hash column.

CREATE INDEX hash_idx ON public.transactions USING HASH (hash)

If you store more than one hundred million rows of transactions and want to filter by the hash column, this index is essential. It has the power to turn a query that runs for 10 or more seconds into a query that finishes in less than a second.

You can also ensure that you don’t accidentally insert duplicate records by adding a UNIQUE INDEX on the time and hash columns.

CREATE UNIQUE INDEX time_hash_idx ON public.transactions (time, hash)

Continuous Aggregates Can Speed Up Queries

Continuous aggregates are materialized views for time-series data. They were designed with time-series data in mind and provide an efficient way to refresh and query the values stored. They even store real-time data. After setting up proper indexes, this is the most important optimization that you can do to make your queries lightning fast.

When it comes to time-series data, users usually aggregate data based on time, creating so-called “time buckets.” Time buckets enable you to analyze how things change over time. In the case of the Bitcoin blockchain, it allows you to see how transaction fees, transaction sizes, block weights, etc., change over time.

These aggregated time-series queries are often recurring queries done by a dashboard or some user-facing application. Hence, it’s important to make these queries as fast as possible to provide a smooth user experience. This is precisely what TimescaleDB’s continuous aggregates will do.

You can set up continuous aggregates to pre-aggregate and materialize information within the database and go from seconds or even minutes to sub-seconds in query execution time.

Even when analyzing a large blockchain database with more than 150 million rows, you can achieve great performance and sub-second query execution speed just by using indexes and continuous aggregates.

Let’s see examples of continuous aggregate setups to store and serve blockchain data.

Three examples of continuous aggregates for Blockchain analytics

Continuous aggregates, set up properly, can speed up your queries and make those queries much easier to write. Let’s choose three aspects of the data as foundations for the continuous aggregate definitions:

  • Transactions
  • Blocks
  • Miners (revenue)

Transactions

Blockchains are designed to record information without ever updating historical data. When it comes to Bitcoin, the blockchain is just a digital public ledger containing information about transactions. Simply put, a transaction tells you who sent what amount to whom, plus the technical details.

Each transaction has at least one input and output field that indicates which address(es) were used to send BTC to which address(es). You can have multiple “senders” and “receivers” in the same transaction.

Transactions are not mined separately; they are mined (added to the blockchain) in groups. These groups are called blocks.

Blocks

A blockchain is made up of blocks. These blocks are important from a data analytics perspective for many reasons. Miners get their revenue based on the block they add to the blockchain. Each block has a unique identifier called block_id. Before SegWit, a block had a size limit of 1 MB. After SegWit, a new measure was added to the network called weight unit, and the new limit for a block became four million weight units.

Miners

Miners are essential for any blockchains. They keep the network up and running. Miners are incentivized with BTC to maintain and validate the order of blocks on the blockchain. In the case of Bitcoin, miners get their revenue from two sources: block rewards and transaction fees. The block reward is currently 6.25 BTC, and it gets halved every 210,000 blocks mined (about every four years). Blockchain users pay the transaction fee every time they submit a transaction.

With continuous aggregates and automation policies, you can simplify your work going forward:

  • You save a ton of resources by materializing (saving the results to disk) the time-based aggregations you often use, which means that the database doesn’t need to recompute the information at query time.
  • The views get maintained automatically due to the refresh policies, so you can just focus on analyzing the data.
  • Real-time aggregations become simple because continuous aggregates combine the query results from the materialized table and the underlying hypertable by default (saving you from doing a UNION in all of your queries).
  • You can compress or delete data from the continuous aggregates on schedule—or delete the underlying raw data in the hypertable and keep the downsampled data in the continuous aggregates.

After setting up indexes and continuous aggregates as optimizations, you can focus on analyzing the data and generating insights while having a fast and simple querying experience.

Now that you know which features we used to analyze our data using PostgreSQL and TimescaleDB, let’s look at the thought process behind it.

Why TimescaleDB Is the Best Database for Blockchain Apps

Of course, we think our database is the best option. Give us a chance to explain why with this brief architectural overview.

Storage layer

TimescaleDB is built on PostgreSQL. We are not PostgreSQL-like or a PostgreSQL fork. We are 100 percent PostgreSQL because we’re an extension, which means that our storage layer is the same that you’ll find in PostgreSQL.

Schema management

However, we modified PostgreSQL’s schema management by introducing auto-partitioning. Very simply, we partition your data based on the time column, providing long-term scalability and some other significant advantages.

Let’s say, for example, that you want to drop one or two years of data. In a relational database, you’d do a DELETE statement, deleting rows one by one. With auto-partitioning, or what we call a chunk mechanism, you can set data retention policies and ask the database to only delete data that is one year or older, for example. The database will then drop tables, making it a faster and more efficient process due to the chunk mechanism.

Query planner

We sometimes modify the query planner, especially with time-series data, if you’re filtering by time or aggregating by time, which is something you’d normally do if you’re analyzing blockchain data. So maybe you’re creating one-day time buckets or five-minute time buckets: either way, we’ll have features that will make the process flow seamlessly and perform better.

SQL API

We are full SQL, meaning everything that works in a PostgreSQL database will also work in TimescaleDB. We don’t take anything away from PostgreSQL; we just add a lot of new functionality, like time-series functions.

These will help you analyze time-series data and blockchain data. For example, we have functions that will help you create the values needed to build candlestick charts. So you can generate the open, high, low, and close values within the database from row data.

We also provide a bunch of other statistical functions that can be useful to analyze blockchain data:

Next steps

One of the talk's slides describing TimescaleDB's superpowers

One last piece of advice: if you’re working with blockchain data, try continuous aggregates

They’re relatively simple to implement and will bring enormous benefits, especially if you’re building a customer-facing analytics platform or maybe software that is not a traditional data application but provides analytics for the users.

We hope that you got inspired to do your own research with PostgreSQL and TimescaleDB. The easiest way to get started is to sign up for a free Timescale account and complete the tutorial that walks you through the process of sourcing, ingesting, storing, and analyzing blockchain data.

In case you have questions that you want to ask the community or Timescale team members, feel free to join the Timescale Forum and Slack Community where we’d be delighted to help you out!

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
11 min read
PostgreSQL, Blog
Contributors

Related posts