Timescale Cloud Tips: How to Optimize Your Ingest Rate
Being the technical geeks we are, we always try to explain our reasoning and modus operandi to enlighten readers and empower their choices. Don't need that today? Just want to learn how to optimize your ingest rate? Go ahead, skip to the point.
One of the advantages of being a Timescale Cloud customer is that you get direct assistance from the Timescale Team. As a data architect, I help our customers with their setup daily: onboarding them to TimescaleDB, helping them find the best workload configuration and how to configure compression correctly, and overall sharing best practices on how to make the most of Timescale Cloud.
Why keep all this knowledge to ourselves? Starting with this blog post, we’ll share a series of short Timescale Cloud tips to address our customers’ recurrent questions.
It’s worth mentioning that the answer to most of these questions will ultimately depend on your use case: in the database world, it’s impossible to offer perfect, one-size-fits-all advice. That’s why there’s nothing like asking for help from the team when needed. Still, going through our general recommendations and best practices will certainly help you have a better understanding of the product you have in hand, helping you optimize your experience and troubleshoot autonomously.
Even if you are not a Timescale Cloud customer (yet!), we hope you will find this advice helpful and apply it in your own environment. If you’re curious, you can try Timescale Cloud completely for free (you won’t need a credit card to start the trial).
How Can I Improve My Ingest Rate a.k.a. Get More Row Inserts Per Second?
We couldn’t start this Timescale Tips series any other way. Time-series data workloads are relentless, and optimizing your ingest rate is a necessity. Almost every Timescale Cloud customer is interested in getting as many row inserts per second as possible into their TimescaleDB service.
How can you ensure you’re getting the best write performance? First, take a look at this blog post we published a while ago: 13 Tips to Improve PostgreSQL Insert Performance. Timescale Cloud is built on PostgreSQL, so the general best practices for maximizing your inserts into Postgres will also apply to Timescale.
However, we often find that what’s holding our Timescale Cloud customers back is “cloud-native best practices.” When dealing with a Timescale Cloud service, it’s not only essential to optimize your practices on the database level but also at the cloud infrastructure level.
Maximizing Ingests in Timescale Cloud: Our Recommendations
In our experience, this is the advice that our customers find more useful:
1. Set proper expectations.
As a general guideline and in a single-node Timescale Cloud service, you should aim for ingesting 50-100k rows per second per ingest process. If you want to write more than 50k-100k rows each second, you will need to set up multiple ingest processes.
2. Make sure you have enough resources on your client side.
The volume of writes you can get will be highly influenced by the number of resources you allocate to this process: this is true from the Timescale Cloud side, as well as from your client side.
Especially if you set up multiple processes, make sure that you have enough CPU to cover them from the client side. For example, if you’re writing to Timescale Cloud from your AWS EC2 instance, ensure that your instance size is large enough to cover your expectations. If only one ingest process is needed, then maybe a 1-CPU EC2 instance might be adequate. However, if using multiple ingest processes, adjust the CPU count to at least the number of ingest processes (e.g., four ingest processes equals four CPU instances).
3. Make sure you have enough resources on your Timescale Cloud side.
Just as on your client side, you also need enough resources in your Timescale Cloud service to handle the writes. At least, we recommend having as many CPUs in the Timescale Cloud service as in your client.
If you’re not sure that you have enough resources in your Timescale Cloud service, you can use forks to make a copy of your database with more CPU and test if your ingest performance improves. Our customers find it very useful (and also cost-effective). We will soon publish another Timescale Cloud Tips blog post explaining how to do this—watch this space.
4. Is your data crossing the world? Try to put your client and server closer together.
One of the great things about using a hosted database service is that you don't have to think about your infrastructure too much. But for certain things, it’s still worth remembering that your database is running on a server somewhere. If you want to really maximize the volume of data you’re writing, you will get better results if your data doesn’t have to cross the world to get to your Timescale Cloud server.
This (network throughput) is another major component to consider. To achieve optimal ingest rates, we highly recommend using a client located in the same AWS region as your Timescale Cloud service. Example: if you are writing massive volumes of data to a Cloud service in us-east-1, try using an EC2 instance also located in us-east-1.
5. Insert rows in batches.
Finally, let’s cover a tip that’s also included in the blog post we referenced above, which is not exclusive to a cloud database, but often arises as a problem.
To maximize your ingest rate, you should insert your data with many rows per
INSERT. When writing your data to Timescale Cloud, we strongly recommend using a single
INSERT statement with multiple tuples: don't insert your data row-by-row. Instead, try to write hundreds (or thousands) of rows per
INSERT, following the syntax below.
INSERT INTO mytable (‘2022-06-01 12:00:00’ 1, 1.11, 1.12), (‘2022-06-01 13:00:00’ 2, 2.21, 2.22)
We hope this post has given you some ideas on what to look for when trying to maximize your inserts per row into your Timescale Cloud database. Remember to look up this blog post for more specific tips on PostgreSQL and TimescaleDB.
If you still haven’t tried Timescale Cloud, you can use it for free for 30 days. It is the most convenient way to use TimescaleDB in production. You’ll get automatic backups and recovery, high availability via replicas in multiple availability zones, flexible resizing with autoscaling, and one-click database forking for testing and creating developer environments. And, of course, the advice of the Timescale Team. 😎
Stay tuned for the next post of the series!