Processing and Protecting Hundreds of Terabytes of Blockchain Data: Zondax’s Story
This is an installment of our “Community Member Spotlight” series, where we invite our customers to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.
In this edition, Ezequiel Raynaudo, data team leader at Zondax, explains how the software company consumes and analyzes hundreds of terabytes of blockchain data using TimescaleDB to create complete backend tech solutions.
About the Company
Zondax is a growing international team of more than 30 experienced professionals united by a passion for technology and innovation. Our end-to-end software solutions are widely used by many exchanges, hardware wallets, privacy coins, and decentralized finance (DeFi) protocols. Security is one of the highest priorities in our work, and we aim to provide the safest and most efficient solutions for our clients.
Founded in 2018, Zondax has been considered the leader in the industry for Ledger apps development, with more than 45 applications built to date and more near production. Since its inception, our team has been building and delivering high-quality backend tech solutions for leading blockchain projects of prominent clients, such as Cosmos, Tezos, Zcash, Filecoin, Polkadot, ICP, Centrifuge, and more.
“The database is a critical part of the foundation that supports our software services for leading blockchain projects, financial services, and prominent ecosystems in the blockchain space”
We put a lot of emphasis on providing maximum safety and efficiency for the ecosystem. Our services can be categorized into security, data indexing, integration, and protocol engineering. Each category has designated project leaders that take the initiative in managing the projects with well-experienced engineers.
About the Team
My team (which currently has two people but is looking to grow by the end of the year) manages the blockchain data in various projects and ensures the tools needed for the projects are well-maintained. For example, we pay close attention to the dynamic sets of different blockchains and create advanced mathematical models for discovering insightful information via data analytics.
The database is a critical part of the foundation that supports our software services for leading blockchain projects, financial services, and prominent ecosystems in the blockchain space. In conclusion, we take serious steps to ensure the work of processing blockchain data remains effective, and that the quality of the results meets Zondax's high standards.
We welcome professionals from different cultures, backgrounds, fields of experience, and mindsets. New ideas are always encouraged, and the diversity within the team has been helping us to identify various potential advancements we can make in leading blockchain projects.
At the same time, our efforts in experimenting and searching for creative and efficient solutions led us to pay close attention to the latest technologies and innovations that we immediately incorporate into our work. We never get bored at Zondax!
Since the Covid-19 pandemic, many companies and teams have switched to remote work temporarily or permanently, and for Zondax this is familiar ground. We adopted a culture of remote work from the start, which has been rewarding and encouraging as team members from around the globe often spark fun and constructive discussions despite nuanced cultural differences. And in terms of quality of work, the tools and platforms we have been using accommodate the needs for smooth communication and effective collaboration within different teams.
About the Project
Zondax provides software services to various customers, including leading projects in the blockchain ecosystem and financial services. We consume and analyze blockchain data in numerous different ways and for multiple purposes:
- As input for other services and ecosystems provided by Zondax and for other third parties
- To apply data science and get value in the form of insights by using mathematical models for different blockchain dynamic sets of variables
- Financial services
- Blockchain data backups
The minimal unit of division of a blockchain, a.k.a. a “block” on most networks, contains a timestamp field among several other sub-structures. It allows you to define blocks at a specific point in time throughout the blockchain history. So having a database engine that can leverage that property is a go-to option.
Choosing (and Using!) TimescaleDB
Our first encounter with TimescaleDB was through a blog post about database optimizations. We decided to go ahead and install it on our infrastructure since it’s built on top of PostgreSQL, and our main code and queries didn’t require to be updated. After installing the corresponding helm chart on our infrastructure, we decided to try it.
The first tests denoted a substantial increase in performance when writing or querying the database, with no optimizations at all and without using hypertables. Those results encouraged us to keep digging into TimescaleDB’s configurations and optimizations, such as using timescaledb-tune and converting critical tables to hypertables.
“If Timescale didn’t exist, we would have a problem and might need to wait a couple of weeks to process a few dozens of terabytes rather than waiting only 1-2 days”
Long story short, we went from having to wait a couple of weeks to process a few dozens of terabytes to only 1-2 days. Among the top benefits of TimescaleDB, I would highlight having the best possible write and read performance. It is a critical part of our ecosystem because it helps us provide fast and responsive services in real time. Using TimescaleDB also allows our software to stay synced with the blockchain's nodes, which is one of the most significant acknowledged advantages of our software and services. Last but not least, we also use TimescaleDB for blockchain data backups to protect data integrity.
Before finding out about TimescaleDB, we first used custom PostgreSQL modifications like indexing strategies and high availability setups. Also, our team did some benchmarking using NoSQL databases like MongoDB, but with no substantial improvements on the write/read speeds that we needed. If Timescale didn’t exist, we would have a problem and might need to wait a couple of weeks to process a few dozens of terabytes rather than waiting only 1-2 days.
We are glad that we chose Timescale and proud of the work that has been expedited and achieved. For example, despite many challenges, we didn't give up on experimenting with new approaches to process a tremendous amount of blockchain data. Instead, we continued exploring new ideas and tools until we eventually started using TimescaleDB, which drastically shortened the time to process data and accelerated our progress in delivering quality results for the projects.
Current Deployment & Future Plans
We deploy TimescaleDB using a custom helm chart that fits our infrastructure needs. As far as programming languages, we mainly use Golang to interact with TimescaleDB; and Hasura as the main query engine for external users.
Advice & Resources
I’d recommend reading the blog posts on how to get a working deployment, hypertables, and Timescale vs. vanilla Postgres's performance using the same queries.
A wise man (Yoda) once said, "you must unlearn what you have learned." It is inevitable to encounter countless challenges when developing a scalable database strategy, but staying curious and willing to explore new solutions with caution can sometimes be rewarding.
We’d like to thank Ezequiel and all of the folks at Zondax for sharing their story and efforts in finding a solution to process enormous amounts of data to build backend solutions for blockchain projects.
We’re always keen to feature new community projects and stories on our blog. If you have a story or project you’d like to share, reach out on Slack (@Ana Tavares), and we’ll go from there.