How we scaled SQL
Time-series data is largely immutable. Writes primarily occur as new appends to recent time intervals, not as updates to existing rows. Both read and write workloads have a natural partitioning across both time and space.
TimescaleDB provides the interface of a single continuous database table across all your time-series data, even though this table is sharded into many individual chunks, with each chunk sized to ensure good performance. One simply treats the hypertable abstraction as a standard PostgreSQL table: for queries, inserts, upserts, triggers, schema changes, and management. In other words, when inserting data, one writes to the hypertable; TimescaleDB will ensure each new row is transparently dispatched to the proper chunk. Similarly, when querying data, one queries the hypertable; TimescaleDB ensures that the query touches just a minimal set of chunks. A database can have multiple hypertables, each with its own schema and partitionings.
TimescaleDB scales performance and storage by transparently partitioning hypertables across multiple dimensions: by a time interval, and optionally by one or more additional columns in your data (e.g., device identifiers for sensor data, symbols in tick data, locations, customers, or users), using both interval and hash partitioning. Each such partition is called a chunk, which is automatically created by the system without administrative interaction. The system scales to many such chunks – easily handling 10,000s of chunks on a single node at scale – and such chunks can be transparently spread across many disks to scale up capacity.
Chunks are dynamically created by the runtime and sized to optimize performance. In the time dimension, users may specify the interval for future chunks, or TimescaleDB can dynamically adapt the interval to optimize performance based on system capacity. Right-sized chunks ensure that the multiple B-trees for a table’s indexes can reside in memory during inserts to avoid thrashing. Further queries on individual chunks involve much smaller indexes or (parallelized) table scans, leading to much faster query times. Further, they make automated data retention policies very efficient, as the runtime can perform such operations by simply dropping chunks (tables), rather than deleting individual rows.
TimescaleDB integrates directly into the PostgreSQL query planner and execution engine, and performs a number of low-level distributed query optimizations to improve performance. These include aggressive constraint exclusion in both time and spatial dimensions to minimize the number of chunks accessed when answering queries, as well as constifying mutable functions at execution time to further scope queries to individual chunks. TimescaleDB also introduces special optimizations for time-based GROUP BY, ORDER BY, and LIMIT clauses, as well as new time-oriented functions to simplify analytical queries.
TimescaleDB supports normal SQL and all of the features you expect from a relational database: JOINs, secondary indexes, complex WHERE predicates, GROUP BY’s, aggregations and orderings, window functions, CTEs, subqueries, statistical and approximation functions, and more. In particular, JOINs between time-series data and business data or metadata stored in separate relational tables can be quite powerful. It avoids the alternatives common to most time-series databases: denormalizing metadata at insert time (both rigid and very expensive to to store/update), or performing such joins in application code instead (which leads to operational complexity, application inflexibility, and integrity issues from maintaining separate databases).
TimescaleDB can be managed like a traditional PostgreSQL database, and can leverage the rich PostgreSQL ecosystem for tooling, streaming replication and hot standbys, point-in-time recovery, and easy backups. It works with standard pg_dump and pg_restore, or with more advanced continuous backup solution such as WAL-E. TimescaleDB supports multi-tenant operation, with many distinct projects and users running in one deployment. It supports highly-flexible access control policies for individuals or groups at the database, table, column, or even row-level. This enables natural ways to restrict users to different types of data (e.g., raw granular data, aggregates, derived).