Building Bitcoin Blockchain Data Applications With PostgreSQL
This blog post is based on a keynote presentation by Timescale’s developer advocate, Attila Toth, at Bitcoin Amsterdam 2022.
As a developer advocate for Timescale, the builder of TimescaleDB, a time-series database based on PostgreSQL, I find analyzing and working with blockchain data particularly interesting. Unsurprisingly, this was the theme I chose for my presentation earlier this year at Bitcoin Amsterdam.
The talk lasted for about half an hour (with Q&A time), and you can watch the whole session here:
In the meantime, I’ll leave you with some of the main points on building Bitcoin blockchain applications using PostgreSQL, the best database to store them, and why it’s TimescaleDB.
Blockchain Data Is Time-Series Data
Blockchain data combines many different types of data—including time-series data—and you can get a lot of value from combining these datasets. In a past blog post, I did exactly that: I analyzed the Bitcoin (BTC) blockchain with PostgreSQL, looking at the transaction volume and BTC USD rate to look into how they correlate and why.
I think this kind of analysis is really interesting because these are two different datasets: on one side, you have the transaction volume, which you might be able to get directly from the blockchain by ingesting the on-chain data, and then you aggregate all the transactions by daily or weekly aggregates; on the other side, you combine this dataset with cryptocurrency data, which has the BTC USD rate. And again, this is all happening within the PostgreSQL database.
Now, some time-series databases won’t allow you to store different types of datasets within the database, so you end up with time-series data in one database and then your relational data in another database. To analyze all this data, you have to sort of combine them on an application layer, and that, as we all know, can be a headache.
With TimescaleDB, because we’re PostgreSQL, you can store all sorts of data in the database: we are a general database, but we also did a lot of optimizations for time-series, a.k.a., blockchain data.
Another raw data example that can be used as a dataset in your data application database is this one.
This is data from the BTC blockchain, on-chain data. As you can see, we have a time column (that will allow you to join this dataset with others), block ID, hash size, block size, etc.
Looking at both these examples, you can notice a few things:
- They are considered blockchain data because they contain information about the blockchain.
- Both datasets are timestamped—they’re time-series data.
- Since blockchains are technically time-series databases, they are append-only: you can add new records or blocks to the blockchain, but you can never modify historical blocks.
- Big blockchain data is real-time data, meaning new data is being processed on the blockchain as you read these words.
So, if you’re building a real-time data application, you must ensure that your database can handle this type of data, which isn’t necessarily easy with analytical databases. This is one of the main reasons why users try TimescaleDB: it’s a relational database but performs amazingly well with analytics and historical data.
- Finally, blockchain data is connected to other datasets.
Even if you work for a business, you can pull internal data and combine it with blockchain public data. There are a lot of possibilities to explore, which is just one of the reasons I’m interested in this data type.
What Is the Best Database to Store Blockchain Data?
Now that we have defined blockchain data, we can establish which features a database must have to handle this type of data seamlessly. You can guess what my choice is. :)
Scalable for time-series data
Since blockchain data is time-series data, you need a database that is scalable for time-series data. The foundation for time-series scalability in TimescaleDB is that we use auto-partitioning based on the time column. Even though we are PostgreSQL, whenever you insert data into a table, it will get partitioned under the hood into multiple tables based on the time column. This is what provides the foundation for long-term scalability for your time-series data.
Historical and real-time capabilities
Real-time queries are very different from historical queries, for example. If you want to provide real-time data on a customer-facing application, that can be very hard to do in the same database. TimescaleDB solves this problem. We can combine the pre-aggregated historical data with the real-time data that is aggregated on the fly. That is not available on PostgreSQL or any other relational database.
Avoid data silos
At last, you need a database that will help you avoid data silos. Again, this is incredibly important if you’re combining different datasets. If you start exploring the BTC blockchain or cryptocurrency data, you’ll probably want to combine them with another dataset after a while.
Fun fact: the founders of Timescale created TimescaleDB because they were building a data application and found that it’s really hard to combine time-series data with other datasets. So, they spent a lot of time implementing this on the application layer: basically, they spent entire developer sprints coding something that really should be in the database.
Needless to say, this is where TimescaleDB really shines. We are open-source and free, you can try us on GitHub, and we also have a cloud offering if you want to go in the cloud.
Timescale Cloud offers additional benefits, from worry-free operations to continuous backup and recovery and zero-downtime upgrades—all with the same supercharged PostgreSQL foundation. You can try a free 30-day trial; no credit card required.
The Best Database for Blockchain Data: An Architecture Overview
TimescaleDB is built on PostgreSQL. We are not PostgreSQL-like or a PostgreSQL fork. We are 100 percent PostgreSQL because we’re an extension, which means that our storage layer is the same that you’ll find in PostgreSQL.
However, we modified PostgreSQL’s schema management by introducing auto-partitioning. Very simply, we partition your data based on the time column, providing long-term scalability and some other significant advantages.
Let’s say, for example, that you want to drop one or two years of data. In a relational database, you’d do a
DELETE statement, deleting rows one by one. With auto-partitioning, or what we call a chunk mechanism, you can set data retention policies and ask the database to only delete data that is one year or older, for example. The database will then drop tables, making it a faster and more efficient process due to the chunk mechanism.
We sometimes modify the query planner, especially with time-series data, if you’re filtering by time or aggregating by time, which is something you’d normally do if you’re analyzing blockchain data. So maybe you’re creating one-day time buckets or five-minute time buckets: either way, we’ll have features that will make the process flow seamlessly and perform better.
We are full SQL, meaning everything that works in a PostgreSQL database will also work in TimescaleDB. We don’t take anything away from PostgreSQL; we just add a lot of new functionality, like time-series functions.
These will help you analyze time-series data and blockchain data. For example, we have functions that will help you create the values needed to build candlestick charts. So you can generate the open, high, low, and close values within the database from row data.
We also provide a bunch of other statistical functions that can be useful to analyze blockchain data:
I didn’t go too deep on any of these features during the presentation, but I did leave attendees with one piece of advice: if you’re working with blockchain data, try continuous aggregates. They’re relatively simple to implement and will bring enormous benefits, especially if you’re building a customer-facing analytics platform or maybe software that is not a traditional data application but provides analytics for the users.
If you want to learn more about any of the features mentioned in the above slide, here’s a comprehensive list of resources:
And finally, I suggest that you check out our Crypto page. We have many helpful, hands-on tutorials on analyzing Bitcoin and blockchain data using TimescaleDB.
Plus: we show you how to get access to data if you don’t have a data provider, examples on how to write a data pipeline and ingest it with TimescaleDB, alongside simple queries that you can use to build a blockchain explorer, cryptocurrency analysis tool, or an app with a customer-facing analytics feature.
To get started today and add even more functionality to PostgreSQL, explore TimescaleDB. It extends PostgreSQL with all the great features I already listed. And if you’re using a managed service for PostgreSQL, try Timescale Cloud—it’s free for 30 days, no credit card required.