Saving Devs’ Time and Compute Power With Retention Policies: The Story of Crypto Trading Platform Pintu

Saving Devs’ Time and Compute Power With Retention Policies: The Story of Crypto Trading Platform Pintu

This is an installment of our “Community Member Spotlight” series, where we invite our customers to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.

In this edition, Christian Halim, engineering manager at Pintu, one of Indonesia’s leading cryptocurrency trading platforms, explains how his team inserts five million data rows into Timescale every 30 minutes using Python and AWS Lambda, queries more than a billion data rows in 0.1 seconds, and saves developers’ time and computational power by using retention policies to delete a billion data rows per day automatically.

About the Company

Founded in 2020, Pintu is one of Indonesia's leading cryptocurrency trading platforms, with over six million app downloads.

In the first quarter of 2023, by trading volume, Pintu was in Indonesia's top three largest crypto platforms. It has the largest community base, with over one million users across the community and social media platforms.

In Indonesia, Pintu is fully regulated and monitored by Commodity Futures Trading Regulatory Agency, also known as BAPPEBTI.

There are several unique value propositions for Pintu users, including:

  1. Simple and easy UI/UX for even first-time investors, from instant Know Your Customer (KYC), 24/7 fast on-ramp and off-ramp through all major banks and e-wallets, and start trading from as low as 11,000 Indonesian Rupiah (IDR), less than $1.
  2. Fully regulated and compliant under BAPPEBTI.
  3. Pintu Academy: the one-stop educational content platform for users to learn about various blockchains, cryptocurrencies, and the technologies behind them.
  4. Multiple crypto products in one platform, including: spot, earn, and staking.

About the Team

Pintu’s Engineering team has around 100 engineers, including frontend engineers, backend engineers, data scientists/engineers, site reliability engineers, and software development engineers in Test. At the time of this writing, Pintu has around 10 teams.

My team is responsible for handling Pintu's transaction records and the users' balance. We mainly use Go and PostgreSQL for our tech stacks, with gRPC as the communication protocol between services. We design and develop new features for Pintu brought by our Product team.

About the Project

We use Timescale for the account value chart, an analytics feature showing users' assets' value history in a specific timeframe.

The account value chart is a graphical aid you can use to monitor the value of your assets on Pintu. There are two kinds of account value charts:

1. Total asset value chart: it displays the value of all your assets in Pintu in Rupiah (Indonesia's official currency) value. You can access this chart on the main page of the Wallet menu.

2. Individual asset value chart: it displays the value of each of your assets in Rupiah and Token values. You can access this chart on each asset's Wallet page.

Pintu's UI: Total Wallet Value in Rupiah on the left, and Token Total Assets on the right
Pintu's UI: Total Wallet Value in Rupiah on the left, and Token Total Asset on the right

The Problem

We mainly used PostgreSQL as our database hosted in the cloud, and we thought it would take a lot of effort to implement our account value chart feature using regular PostgreSQL.

While brainstorming for solutions, we stumbled upon Timescale and decided to use it—it was actually our VP of Engineering, Albert Widiatmoko, who searched online for “time-series database.” He had already heard about Timescale and followed the company’s GitHub and Twitter prior to the search.

Although we had tried MongoDB, we saw some benefits in using Timescale instead of MongoDB to store our price chart data: we don't need to maintain the logic for creating OHLC manually and can use Timescale's hyperfunction instead. Also, Timescale has better query performance and automatic obsolete data deletion [using retention policies].

What other reasons led us to Timescale?

  1. We felt it was a good fit for our problem since we needed periodic refreshes and aggregate data to display to the user.
  2. Most of us were familiar with PostgreSQL already.

We query and serve our data from Timescale to users using Go. Aside from Go, we also use Python, which we run in AWS Lambda to insert data into Timescale periodically.

We support around 200 cryptocurrencies and three million users at Pintu at the moment. To give you a more detailed picture, our Lambda runs once every 30 minutes. Each time it runs, it inserts about five million rows of data. At a point in time, there are around 1.4 billion rows in the raw table since we use a retention policy to delete obsolete data.

Impact of Using Timescale

Time to market is a crucial factor for us, and Timescale helped us a lot in this aspect since it already supports most of the features we wanted to develop. We could also extend the features later in the future when needed since Timescale supports a variety of aggregates for time-series data.

“Our queries are really fast, taking only 100 ms for a table with around 1.4 billion rows”

The feature that we developed, the account value chart, was the main topic of discussion in our community when it was first released since customers could share their gainings or losses over a period of time.

We used Timescale right from the start for this feature, so we can’t compare Timescale’s performance to vanilla PostgreSQL. However, our queries are really fast, taking only 100 ms for a table with around 1.4 billion rows.

Sample query:

SELECT namespace, external_id, asset_type_id, balance, balance_idr, at
FROM user_asset_value_chart_1_day
WHERE namespace = $1
	AND external_id = $2
	AND at > $3
	AND asset_type_id = $4

We mainly utilized continuous aggregates to create this feature, which greatly helped us since we didn't have to maintain multiple tables for each timeframe.

Editor's Note: Continuous aggregates are Timescale's version of PostgreSQL's incrementally-updated materialized views.

Currently, we support several timeframes, and in each timeframe, the interval of stored data is different. For example, in a one-day timeframe, we use 30-minute intervals; in a three-month timeframe, we use 12-hour intervals since the data will be too granular if we use 30-minute intervals instead. This means we would have 3 * 30 * 24 * 2 data per user (30-minute interval) vs. 3 * 30 * 2 data per user (12-hour interval).

--Pintu's continuous aggregates

SELECT add_continuous_aggregate_policy('wallet_asset_value_chart_1_day', start_offset => INTERVAL '2h30m', end_offset => INTERVAL '30m', schedule_interval => INTERVAL '30m');

SELECT add_continuous_aggregate_policy('wallet_asset_value_chart_1_week', start_offset => INTERVAL '5h', end_offset => INTERVAL '1h', schedule_interval => INTERVAL '1h');

SELECT add_continuous_aggregate_policy('wallet_asset_value_chart_1_month', start_offset => INTERVAL '20h', end_offset => INTERVAL '4h', schedule_interval => INTERVAL '4h');

SELECT add_continuous_aggregate_policy('wallet_asset_value_chart_3_months', start_offset => INTERVAL '2 days 12h', end_offset => INTERVAL '12h', schedule_interval => INTERVAL '12h');

SELECT add_continuous_aggregate_policy('wallet_asset_value_chart_6_months', start_offset => INTERVAL '5 days', end_offset => INTERVAL '1 day', schedule_interval => INTERVAL '1 day');

Our team also saved time using retention policies, so we didn't have to delete obsolete data manually. We should delete around 240 million rows of data daily from the raw table. But since we use retention policies on each continuous aggregate—we have seven continuous aggregates for each timeframe—we should delete approximately 800 million rows of data. This adds up to around one billion deleted data rows per day when the policies run simultaneously.

Regarding hyperfunctions, we only used the first() hyperfunction in our account value chart feature as we simply needed to get users’ asset values for a specific time.

Right now, only a handful of our devs really understand Timescale since the features we developed so far rarely involve Timescale. Still, we are planning to share the knowledge with other engineers. We haven’t really developed other features outside this, but we believe it will be much faster if we utilize the same data as the base.

“While we can manually delete obsolete data without Timescale, we saved our developers’ effort and the server's computational power [by creating a retention policy]”

Roadmap and Future Work

Currently, we haven’t any concrete future project utilizing Timescale in mind. However, we have also used Timescale for storing our cryptocurrency price chart, so I believe we’ll keep using Timestable in the foreseeable future since we already have the foundation built.

Advice & Resources

Timescale is very powerful if your data is time series, and you need to have plenty of analytics toward the data. Their official docs are already great, so I really recommend developers learn from there.

If there is any advice I can give, I think it’s utilizing cost-saving Timescale features, such as compression and retention policies, when possible. We save on storage with a retention policy since we automatically delete obsolete data.

While we can manually delete obsolete data without Timescale, we saved our developers’ effort and the server's computational power, although, unfortunately, I don't have the exact data for how much we saved.

Anything else you would like to add?

Great Customer Support! Tanya and Hardik helped us a lot during our development stage.

Facing similar problems while building your application?

We’d like to thank Christian Halim and the folks at Pintu for sharing their story on accelerating their time to market, saving with compression and retention policies, and providing users with quick insight into their gainings or losses.

We’re always keen to feature new community projects and stories on our blog. If you have a project you’d like to share, reach out on Slack (@Ana Tavares), and we’ll go from there.

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
6 min read
Dev Q&A

Related posts