How to write partitioning_func correctly for create_hypertable() to place data in calendar year?

Hi! Tell me how to correctly write partitioning_func body to partition data by calendar year?
Thanks a lot.

1 Like

Hi Lux, if you’re looking for a year chunk, any reason why setting chunk time interval would not work?

If you want to setup a partitioning_func probably these examples from the tests can help you.

1 Like

Thank you for example!
The standard functionality works as expected. I want to deal with the options for creating a hypertable in addition.

Hi! In the following code create_hypertable function output error

CREATE TABLE test_table_for_time_partition(
	id NUMERIC,
	create_date TIMESTAMP,
	id_card VARCHAR(64),
	CONSTRAINT test_table_for_time_partition_pkey PRIMARY KEY(id, create_date)
);

CREATE FUNCTION fn_partition_by_year(jsonb)
  RETURNS timestamp
  LANGUAGE SQL
  IMMUTABLE AS
$func$
	SELECT date_trunc('year', ($1->>'create_date')::timestamp);
$func$;

SELECT create_hypertable('test_table_for_time_partition', 'create_date', time_partitioning_func => 'fn_partition_by_year');
SQL Error [42P13]: ERROR: invalid partitioning function
  Подсказка: A valid partitioning function for open (time) dimensions must be IMMUTABLE, take the column type as input, and return an integer or timestamp type.

How to correctly create a function that will correctly output a value for a partition?

Hey Lux, I’m curious why chunk_time_interval => ‘1 year’ would not work for your case. Also, it seems you don’t have the jsonb in the table.

Hi @jonatasdp!

The first reason is chunk_time_interval => ‘1 year’ inserts dates ‘2022-01-01’ and ‘2022-12-31’ into different chunks instead of to insert ‘2022-12-31’ into the same chunk which contains ‘2022-01-01’. This moment also applies to both weekly and monthly chunks.
The second reason is to investigate how partitioning parameters work when creating a hypertable.

Thanks! At now I understand how partitioning function works.
The following example works for me:

CREATE TABLE test_table_for_time_partition(
	id NUMERIC,
	create_date TIMESTAMP,
	CONSTRAINT test_table_for_time_partition_pkey PRIMARY KEY(id, create_date)
);

CREATE OR REPLACE FUNCTION fn_partition_by_year(timestamp)
RETURNS timestamp
LANGUAGE SQL
IMMUTABLE AS
$func$
	SELECT date_trunc('year', $1);
$func$;

SELECT create_hypertable('test_table_for_time_partition', 'create_date', time_partitioning_func => 'fn_partition_by_year');

INSERT INTO test_table_for_time_partition(id, create_date)
VALUES
	(1, TIMESTAMP '2017-01-01',),
	(2, TIMESTAMP '2018-01-01'),
	(3, TIMESTAMP '2019-01-01'),
	(4, TIMESTAMP '2020-01-01'),
	(5, TIMESTAMP '2021-01-01'),
	(6, TIMESTAMP '2022-01-01'),
	(7, TIMESTAMP '2023-01-01'),
	(8, TIMESTAMP '2017-01-02'),
	(9, TIMESTAMP '2018-01-02'),
	(10, TIMESTAMP '2019-01-02'),
	(11, TIMESTAMP '2020-01-02'),
	(12, TIMESTAMP '2021-01-02'),
	(13, TIMESTAMP '2022-01-02'),
	(14, TIMESTAMP '2023-01-02'),
	(15, TIMESTAMP '2017-01-03'),
	(16, TIMESTAMP '2017-01-04'),
	(17, TIMESTAMP '2018-01-03'),
	(18, TIMESTAMP '2019-01-03'),
	(19, TIMESTAMP '2020-01-03'),
	(20, TIMESTAMP '2021-01-03'),
	(21, TIMESTAMP '2022-01-03'),
	(22, TIMESTAMP '2023-01-02'),
	(23, TIMESTAMP '2017-12-31'),
	(24, TIMESTAMP '2018-12-31'),
	(25, TIMESTAMP '2019-12-31'),
	(26, TIMESTAMP '2020-12-31'),
	(27, TIMESTAMP '2021-12-31'),
	(28, TIMESTAMP '2022-12-31'),
	(29, TIMESTAMP '2023-12-31');

After insertion all the data for a certain year is in one chunk.

:information_source: I tryed example above for month period and following observations will be corresponding to month period.

I noted interesting CHECK constraint for chunks when time_patitioning_func is used for example for a month:

CREATE TABLE "_timescaledb_internal"."_hyper_123_1324_chunk" (
	CONSTRAINT "1234_1234_test_table_for_time_partition_pkey" PRIMARY KEY (id, create_date),
	CONSTRAINT constraint_1324 CHECK (((fn_partition_by_year(create_date) >= '2022-01-27 00:00:00'::timestamp without time zone) AND (fn_partition_by_year(create_date) < '2022-02-03 00:00:00'::timestamp without time zone)))
)
INHERITS (test_table_for_time_partition);
CREATE INDEX _hyper_123_1234_chunk_test_table_for_time_partition_expr_idx ON _timescaledb_internal._hyper_123_1234_chunk USING btree (date_trunc('month'::text, create_date) DESC);

If to insert ‘2022-01-26’ timestamp then it will be inserted in right chunk.
But I can not understand why CHECK constraint do not contain the following period corresponding to the function: fn_partition_by_year(create_date) >= '2022-01-01' and fn_partition_by_year(create_date) < '2022-02-01'.

I also noted that
SELECT create_hypertable('test_table_for_time_partition', 'create_date', time_partitioning_func => 'fn_partition_by_year');
creates default chunk_time_interval for hypertable that equals to 7 days. How does this chunk_time_interval = 7 days relate to fn_partition_by_year = SELECT date_trunc('month', $1);? May be such CHECK constraint must containg also month period? Or perhaps it is possible to ignore chunk_time_interval and assign to it NULL value? Or may be it is possible to create CHECK constraint also using some function like fn_partition_by_year() for dynamic period?

I droped hypertable and recreated it and now partition function did not locate ‘yyyy-12-31’ to chunk that contains ‘yyyy-01-01’. I lost my understanding of what was happening again.

Hi Lux, have you checked if maybe that is a time zone issue?

If not, please, create an issue in the official repository, it seems an unexpected behavior.

Hi, @jonatasdp !

Below is the simple example of the problem when data for the same year are in different partitions:

CREATE TABLE test.year_partition_hypertable(id INTEGER, create_date TIMESTAMP);

SELECT create_hypertable('test.year_partition_hypertable', 'create_date', chunk_time_interval => INTERVAL '1 year');

INSERT INTO test.year_partition_hypertable
SELECT gs, TIMESTAMP '2010-01-01' + (gs || ' month')::INTERVAL
FROM pg_catalog.generate_series(1, 50, 1) gs
;

SELECT 'SELECT * FROM ' || a || ';' FROM show_chunks('test.year_partition_hypertable') AS a;

SELECT * FROM _timescaledb_internal._hyper_346_9437_chunk;
id|create_date            |
--+-----------------------+
 1|2010-02-01 00:00:00.000|
 2|2010-03-01 00:00:00.000|
 3|2010-04-01 00:00:00.000|
 4|2010-05-01 00:00:00.000|

SELECT * FROM _timescaledb_internal._hyper_346_9438_chunk;
id|create_date            |
--+-----------------------+
 5|2010-06-01 00:00:00.000|
 6|2010-07-01 00:00:00.000|
 7|2010-08-01 00:00:00.000|
 8|2010-09-01 00:00:00.000|
 9|2010-10-01 00:00:00.000|
10|2010-11-01 00:00:00.000|
11|2010-12-01 00:00:00.000|
12|2011-01-01 00:00:00.000|
13|2011-02-01 00:00:00.000|
14|2011-03-01 00:00:00.000|
15|2011-04-01 00:00:00.000|
16|2011-05-01 00:00:00.000|

SELECT * FROM _timescaledb_internal._hyper_346_9439_chunk;
id|create_date            |
--+-----------------------+
17|2011-06-01 00:00:00.000|
18|2011-07-01 00:00:00.000|
19|2011-08-01 00:00:00.000|
20|2011-09-01 00:00:00.000|
21|2011-10-01 00:00:00.000|
22|2011-11-01 00:00:00.000|
23|2011-12-01 00:00:00.000|
24|2012-01-01 00:00:00.000|
25|2012-02-01 00:00:00.000|
26|2012-03-01 00:00:00.000|
27|2012-04-01 00:00:00.000|
28|2012-05-01 00:00:00.000|

SELECT * FROM _timescaledb_internal._hyper_346_9440_chunk;
id|create_date            |
--+-----------------------+
29|2012-06-01 00:00:00.000|
30|2012-07-01 00:00:00.000|
31|2012-08-01 00:00:00.000|
32|2012-09-01 00:00:00.000|
33|2012-10-01 00:00:00.000|
34|2012-11-01 00:00:00.000|
35|2012-12-01 00:00:00.000|
36|2013-01-01 00:00:00.000|
37|2013-02-01 00:00:00.000|
38|2013-03-01 00:00:00.000|
39|2013-04-01 00:00:00.000|
40|2013-05-01 00:00:00.000|

SELECT * FROM _timescaledb_internal._hyper_346_9441_chunk;
id|create_date            |
--+-----------------------+
41|2013-06-01 00:00:00.000|
42|2013-07-01 00:00:00.000|
43|2013-08-01 00:00:00.000|
44|2013-09-01 00:00:00.000|
45|2013-10-01 00:00:00.000|
46|2013-11-01 00:00:00.000|
47|2013-12-01 00:00:00.000|
48|2014-01-01 00:00:00.000|
49|2014-02-01 00:00:00.000|
50|2014-03-01 00:00:00.000|

And the CHECK constraint in the definition of for example _timescaledb_internal._hyper_346_9438_chunk is the following:

CREATE TABLE "_timescaledb_internal"."_hyper_346_9438_chunk" (
	CONSTRAINT constraint_9314 CHECK (((create_date >= '2010-05-31 00:00:00'::timestamp without time zone) AND (create_date < '2011-05-26 00:00:00'::timestamp without time zone)))
);

Instead of period >=2011-01-01 and < 2012-01-01 it is >= 2010-05-31 and < 2011-05-26.

The same situation is when chunk_time_interval = 1 month:

CREATE TABLE "_timescaledb_internal"."_hyper_348_9447_chunk" (
	CONSTRAINT constraint_9323 CHECK (((create_date >= '2010-01-31 00:00:00'::timestamp without time zone) AND (create_date < '2010-03-02 00:00:00'::timestamp without time zone)))
);

id|create_date            |
--+-----------------------+

30|2010-01-31 00:00:00.000|
31|2010-02-01 00:00:00.000|
32|2010-02-02 00:00:00.000|
33|2010-02-03 00:00:00.000|
34|2010-02-04 00:00:00.000|
...
57|2010-02-27 00:00:00.000|
58|2010-02-28 00:00:00.000|
59|2010-03-01 00:00:00.000|

Is this normal behaviour or should I create an issue?

Thanks in advance.

Hi Lux! Thanks for bringing all the details and investigating it! I totally agree that it seems weird behavior. I also tried to generate_series start from 0 instead of 1 to have the first date in January but it still breaks down the date a bit later. I also used timestamptz and the date still brings the same.

I’ll ask some core database team member if they can take a look here, and then we’ll know if we need to create an issue for it.

Hi @jonatasdp!
I tried partitioning for calendar year/month the following way and it was successfull:

CREATE TABLE test.ht_chunk_calendar_period(
	id INTEGER,
	created_at TIMESTAMP,
	CONSTRAINT ht_chunk_calendar_period_pk PRIMARY KEY (id, created_at)
);

CREATE OR REPLACE FUNCTION test.fn_calendar_period_partition(TIMESTAMP)
RETURNS TIMESTAMP
LANGUAGE SQL
IMMUTABLE AS $func$
	SELECT date_trunc('year', $1);
$func$;

SELECT *
FROM create_hypertable('test.ht_chunk_calendar_period', 'created_at',
		chunk_time_interval => INTERVAL '1 year',
		time_partitioning_func => 'test.fn_calendar_period_partition');

INSERT INTO test.ht_chunk_calendar_period
SELECT ROW_NUMBER() OVER(PARTITION BY date_trunc('year', cal.dt)), cal.dt
FROM pg_catalog.generate_series(TIMESTAMP '2020-01-01', TIMESTAMP '2026-01-01', INTERVAL '1h') AS cal(dt);

But I do not understand how CHECK CONSTRAINT works because its bounderies are not the start and end of calendar period. For example constraint for 2020 year is:

CONSTRAINT constraint_294 CHECK (((
	test.fn_calendar_period_partition(created_at) >= '2019-04-14 00:00:00'::timestamp without time zone)
	AND (test.fn_calendar_period_partition(created_at) < '2020-04-08 00:00:00'::timestamp without time zone)))

Hio @LuxCore, looking the function definition, it seems IMMUTABLE, so probably it’s caching the result from the day you run it first time :thinking:

This behaviour still exists. How can I workaround this caching the result from the day I run it first time?

Thanks in advance