Partitioning questions

I have a couple of tables that are pretty big, 2+ billion rows. Now, querying by time is fast and there is no problem. But sometimes I want to use these tables to make queries where I need to, for example select all the rows that match an ID, regardless of time. Making indexes on these columns doesn’t make it much faster. Is this where partitioning comes in ? Since I can’t find much documentation on partitioning :

  • Will partitioning the table by default ( time ) and also by the said ID work better ?
  • When you partition a table by more columns, what are the space consumption implications ?
  • Can you further partition a table once it’s been populated ?

Thanks !

Hey @MasterKraft, very good question!

  1. Yes, you can have an extra dimension to partition your data and it will make sense if you have a large cardinality and needs to better group the data.
  2. It will generate more chunks, which will generate more metadata around it. From previous interactions I see folks reporting under 10k chunks generally should not affect performance heavily. but I even did a few tests with 5M chunks so, it depends on your business, it will just be another fold of your data.
  3. I think yes, but only new data will start using the new partition dimension. Tbh never tried, maybe it will not be allowed for consistency.
1 Like
  • Creating index on ID and making select only on ID column should be much faster then without index. Be careful with such queries, because they must touch every chunk table that can take some time.
  • Did you perform PostgreSQL ANALYZE command on table after index creation?
  • What does EXPLAIN show? Is query using index or not?

and

It depends on partitioning type. For hash space partitioning new space dimension can be added and new chunks will be created with additional partitioning. Old data will stay untouched. But… I don’t really see much of the point of using “hash” partitioning, it was developed for multi-node partitioning (TimescaleDB installed on several computers like cluster) that is discontinued feature in TimescaleDB v2.14.0. Maybe hash partitioning is useful if you can make sure every chunk hits its own physical disk with tablespace management. I haven’t tested this.

Adding “range” space partitioning (new in TimescaleDB v2.13.0) can’t be performed performed after hypertable is filled with the data. I tested this and it does not work. New “range” space partition must be added after creating hypertable and before inserting any data in it. This is actually also explained in adding new space dimension documentation, see red warning at top of web page.

But… I don’t really see much of the point of using “hash” partitioning, it was developed for multi-node partitioning (TimescaleDB installed on several computers like cluster) that is discontinued feature in TimescaleDB v2.14.0

one idea that could be useful for hash dimensions is if you have like a company_id or something that segments your business queries most of the time. You can have dedicated chunks for then but still have all aggregated data in the same table. So, instead of using different schemas or dbs, you use the extra dimension. It would allow to delete all data from a company only by touching “their chunks”.