Thanks for the suggestion. I can try to give a short overview of the stats:
I have in total 297 chunks of which 209 of them is compressed. So essentially, my updates are on the 97 chunks, since I never update in the chunked areas.
My table is like this
CREATE TABLE IF NOT EXISTS public."DataPoints"
(
"Id" bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 ),
"DeliveryStartUTC" timestamp with time zone NOT NULL,
"Val" numeric NOT NULL,
"CurveId" integer NOT NULL,
"TimeRegisteredUTC" timestamp with time zone NOT NULL,
"InsertedAtUTC" timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT "FK_DataPoints_Curves_CurveId" FOREIGN KEY ("CurveId")
REFERENCES public."Curves" ("Id") MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE CASCADE
)
I have an index on “DeliveryStartUTC” and “Id” in the same b-tree to make my update query faster. As seen in the query plan, it also utilize the query.
The update statement is a part of a trigger. The trigger checks if the value of the current newest record is the same. If the same, don’t insert the new one. If different then insert. If the record has an older TimeRegisteredUTC time than the current record, then it updates the current record with the older/new oldest seen TimeRegisteredUTC to always have TimeRegisteredUTC reflect when, we could in reality expect to see the value.
The trigger runs pretty smooth. It is just the update which kills performance from my findings. Changing update to insert helps quite a bit, but still sometimes falls a bit behind. I have dropped my hyper table serveral times as well, due to me testing out things. I feel my first runs with same chunk size was significantly faster. Maybe it is just placebo effect, of the update being slow? Just want to make sure I should not clear up something from the past dropped compressed hyper tables.
As mentioned, there is an index on the UPDATE and all the select statements with order by included in the index. I am not using skip scan, but running the SELECT statements myself seems to be performing very well without it.
The trigger is here:
CREATE FUNCTION public.before_insert_on_datapoints()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$
DECLARE
prev_rec RECORD;
next_rec RECORD;
BEGIN
-- Find the record with the maximum "TimeRegisteredUTC" that is less than new."TimeRegisteredUTC"
SELECT "DeliveryStartUTC", "Val", "CurveId"
FROM public."DataPoints"
WHERE "CurveId" = new."CurveId"
AND "DeliveryStartUTC" = new."DeliveryStartUTC"
AND "TimeRegisteredUTC" <= new."TimeRegisteredUTC"
ORDER BY "TimeRegisteredUTC" DESC
LIMIT 1
INTO prev_rec;
-- Check if a previous record exists and matches new."TimeRegisteredUTC"
IF prev_rec = (new."DeliveryStartUTC", new."Val", new."CurveId") THEN
RETURN NULL;
END IF;
-- Find the record with the minimum "TimeRegisteredUTC" that is greater than new."TimeRegisteredUTC"
SELECT "DeliveryStartUTC", "Val", "CurveId", "TimeRegisteredUTC", "Id", "InsertedAtUTC"
FROM public."DataPoints"
WHERE "CurveId" = new."CurveId"
AND "DeliveryStartUTC" = new."DeliveryStartUTC"
AND "TimeRegisteredUTC" >= new."TimeRegisteredUTC"
ORDER BY "TimeRegisteredUTC" ASC
LIMIT 1
INTO next_rec;
-- Check if a next record exists and matches new."TimeRegisteredUTC"
IF next_rec IS NOT NULL
AND next_rec."CurveId" = new."CurveId"
AND next_rec."DeliveryStartUTC" = new."DeliveryStartUTC"
AND next_rec."Val" = new."Val" THEN
-- Update the existing record only if new TimeRegisteredUTC is older
-- so it always sets TimeRegisteredUTC to the oldest time seen.
-- This is used when getting historical data.
IF new."TimeRegisteredUTC" < next_rec."TimeRegisteredUTC" THEN
UPDATE public."DataPoints"
SET "TimeRegisteredUTC" = new."TimeRegisteredUTC"
WHERE "DeliveryStartUTC" = next_rec."DeliveryStartUTC"
AND "Id" = next_rec."Id"
END IF;
-- Don't insert the new record
RETURN NULL;
END IF;
-- If no matches found, allow insertion of new record
RETURN new;
END $BODY$;