Overheard @ Office Hours: January Edition

Check out the latest roundup of community questions, ranging from the best place to get started to ways to optimize your queries and schemas.

We hosted our first two Timescale Office Hours of the new year - and, for those who couldn’t make it or are curious about what we discuss, we’re highlighting a selection of questions and answers.

What’s Office Hours? A 2x monthly forum with our product and engineering team, open to all community members (whether you’re new-to-Timescale, an advanced user, or somewhere in between). Initially started as an experiment, it’s turned into a great, informal way to discuss use cases, answer questions in real-time, and share knowledge with community members.

If you’re interested in attending an upcoming Office Hours, we’re hosting one tomorrow, Tuesday, February 11 @ 9am PT/12pm ET/5pm GMT (see upcoming sessions to find one that works for you).

Everyone’s welcome and any question is fair game -- and we hope to see you soon!

I need help setting up my schema. Is there any advice you can give?

We just created a tool - PGSchema - to help users generate a schema for Postgres and TimescaleDB. When you visit the site, you’ll see three steps: input CSV, format, and copy.

If you need additional support as you build your schema (i.e. you have a unique use case), we recommend posting in our community Slack channel. One of our active community members or Timescale engineers will get back to you to help quickly.

Note: If you’re using Timescale Cloud, you can work directly with your Cloud Advocate to find the best options.

When should I use native compression vs materialized views?

Firstly, they’re not mutually exclusive - you can use both features simultaneously.

While the end result of these functions may be similar (faster query performance), they are actually very different in nature.

Compression allows you to store more data in less actual storage and reduces your total data footprint. Ultimately, you will achieve some query performance gains by loading fewer pages from disk into memory (reduce disk I/O) as a result of compression.

See our “Building Columnar Compression” blog post and Compression docs to learn more.

On the flip side, materialized views pre-compute information needed to execute common queries.

With TimescaleDB, our version of materialized views are a bit different: we call them “continuous aggregates” (see our docs and our blog post explanation of how they work).
The main function of continuous aggregates is to speed up queries that aggregate over time.

In conclusion, compression and continuous aggregates (materialized views) present value in two different ways - and your decision depends on your use case:

If you are collecting a lot of data (and want to keep your historical data) but running into storage limitations, use compression.

Compression’s main function is storage management. While it can improve query performance - through reduced disk IO (which everyone should see) and through the restructuring of your data (which depends on your query patterns) - improvements vary by use case.
However, everyone that uses compression will see a reduced data storage footprint.

If you want to speed up your query results (and are likely collecting data in near real-time), use continuous aggregates.

Continuous aggregates are all about improving performance. They allow you to pre-compute functions on your data without incurring the expense at query time - thus improving your query performance.
Additionally, since TimescaleDB uses materialized views and a job scheduler to keep the view up-to-date, you don’t need to manage your own cron jobs.
You can also apply data retention policies to continuous aggregates >> save a lot on storage as well.

When do you plan to release support for PG12?

In our next release! TimescaleDB 1.7 is scheduled for late Q1 of 2020. You can learn about our future plans & past releases here.

How do I get started with Timescale Cloud?

Prior to signing up for Timescale Cloud, you can sign up for a demo to discuss your needs, and our Cloud Advocate team will work with you to find the best option for your projects. If you aren’t interested in a demo, you can skip straight to the free trial which provides $300 in credits for 30 days.

After you start the free trial, we recommend reading this blog post, which walks through the basics of signing up and activating your account. From there, check out some of our sample datasets and tutorials to get a feel for how Timescale works (capabilities, functions, and more).

I know basic SQL, but I’m not sure how to optimize my queries (performance, speed, etc.)?

There are many options! First, if you’re new to TimescaleDB, you’ll find that our architecture (chunks and hypertables) is designed to optimize query performance for large time-series workloads.

We also recommend using features like time_bucket and time_bucket_gapfill, which live alongside SQL, but are optimized for time-series workloads (and replace complex SQL operations):

time_bucket is like SQL date_trunc, but it allows you to aggregate time intervals by 5 minutes, 7 hours, or 30 milliseconds (or whatever interval you need) - something that’s pretty complicated with standard SQL.
time_bucket_gapfill expands on time_bucket’s ability to aggregate arbitrary intervals, addressing a common time-series data scenario: “what do I do when I’m missing data for my interval?” (e.g., a system was down for maintenance or when a device lost network connectivity and didn’t send any values).
Gap filling allows you to choose how you’d like to “fill in” the data you’re missing: use the last value recorded until you get to the next recorded value (carrying forward the last observation) or plot a linear relationship between values.
In addition to the docs above, check out “Simplified Time-Series Analytics Using the time_bucket() Function” to get hands-on with a practical example.

Beyond our architecture itself, we’ve blogged about this topic quite a bit and encourage you to check out these posts:

How to use composite indexes to speed up time-series queries - demonstrates when building indexes, order matters.
Implementing constraint exclusion for faster query performance - discusses built-in TimescaleDB query performance optimizations.
OrderedAppend: An optimization for range partitioning - covers a TimescaleDB-specific function for performance benefits.

And, as always, you’re welcome to reach out in our community Slack channel to ask scenario-specific questions, get feedback on current queries, and/or share how you’ve improved your queries.

Where to go from here

Join an upcoming Office Hours session: we’re hosting our next Office Hours sessions tomorrow, Tuesday, February 11th at 9am PT / 12pm ET / 5pm GMT and Wednesday, February 26th at 5pm PT / 8pm ET / 12pm ACT (Thursday).