Last(*) in timebucket

Hey, let me lay out my scenario.

I have a bunch of data, each row is around 1-2 seconds apart and there is approx 10 million rows of data.
Each rows has an account_id and value.

For each account
I want to be able to group the data into 30 minute time buckets and select the latest value in each bucket. I only need the data for the past x days (in the example below, 90 days) I am having a lot of trouble creating an efficient query to do this.

Here is an example

SELECT
last(account_id, timestamp),
last(timestamp, timestamp),
last(value, timestamp),
last(c, timestamp),
last(d, timestamp),
last(e, timestamp),
last(f, timestamp),
last(g, timestamp),
time_bucket('0.5 hour', timestamp) AS bucket
FROM table
WHERE account_id = '?' 
AND timestamp >= date_trunc('day', CURRENT_TIMESTAMP - interval '90 day')
GROUP BY bucket
ORDER BY bucket

Ideally, i want to have something like but i am not sure how.

select last(*, timestamp)

This query also doesnt seem very efficient, and takes up to 30 seconds to run.

If i just do a simple order by timestamp query, and then do the ‘bucketing’ in the application java code it runs much faster. Wondering if i am misusing timescale here.

Hey @marcogroot , can you run EXPLAIN ANALYZE on the query and share the output? You can also use https://explain.dalibo.com/ or https://explain.depesz.com/ to help you understand the output.

Probably you’re missing some indices and depending on how you build it you can get it fast.

Another detail is the last of order by bucket seems not fair. Probably you want the last by snapshot_timestamp itself and not the bucket.

Also, it seems your snapshot_timestamp is not your time dimension, so probably you’ll need extra indices or created combined indices for such cases.

This video can help you too: https://www.youtube.com/watch?v=31EmOKBP1PY

Hey,

Was able to get the query working efficiently. To be honest not really sure what is that different but is running quickly now :smiley:

New sql

            SELECT
                last(id, timestamp) as id,
                last(account_id, timestamp) as account_id,
                locf(last(c, timestamp)) as c,
                locf(last(timestamp, timestamp)) as timestamp,
                locf(last(d, timestamp)) d,
                locf(last(e, timestamp)) as e,
                locf(last(f, timestamp)) as f,
                time_bucket_gapfill('0.5 hour', timestamp) AS bucket
            FROM table
            WHERE account_id = :account_id::uuid
            AND c = :c
            AND timestamp > :startTime::timestamptz and timestamp < :endTime::timestamptz
            GROUP BY bucket
            ORDER BY bucket 

Indexes:

CREATE INDEX IF NOT EXISTS idx_account_id_c on table(account_id, c);
CREATE INDEX IF NOT EXISTS idx_id on table(id);
CREATE INDEX IF NOT EXISTS idx_timestamp on table(timestamp);

Explain

//        --
//        Limit  (cost=383.22..389.22 rows=1 width=176) (actual time=2.502..6.886 rows=15 loops=1)
//        CTE all_time_zones
//        ->  Custom Scan (GapFill)  (cost=310.23..383.22 rows=200 width=176) (actual time=2.482..6.146 rows=684 loops=1)
//        ->  GroupAggregate  (cost=310.23..380.22 rows=200 width=0) (actual time=2.477..5.849 rows=482 loops=1)
//        Group Key: (time_bucket_gapfill('00:30:00'::interval, _hyper_1_1_chunk.timestamp, NULL::timestamp with time zone, NULL::timestamp with time zone))
//        ->  Sort  (cost=310.23..317.34 rows=2844 width=67) (actual time=2.453..2.612 rows=2411 loops=1)
//        Sort Key: (time_bucket_gapfill('00:30:00'::interval, _hyper_1_1_chunk.timestamp, NULL::timestamp with time zone, NULL::timestamp with time zone))
//        Sort Method: quicksort  Memory: 363kB
//        ->  Result  (cost=0.00..147.07 rows=2844 width=67) (actual time=0.020..1.654 rows=2843 loops=1)
//        ->  Append  (cost=0.00..111.52 rows=2844 width=58) (actual time=0.019..1.186 rows=2843 loops=1)
//        ->  Seq Scan on _hyper_1_1_chunk  (cost=0.00..43.70 rows=1385 width=56) (actual time=0.019..0.417 rows=1385 loops=1)
//        Filter: ((timestamp > '2024-04-01 11:19:52.270045+10:30'::timestamp with time zone) AND (timestamp < '2024-04-17 11:19:52.270045+09:30'::timestamp with time zone) AND (account_id = '1a1473d4-129a-4d0e-bcd6-d0760d8182e5'::uuid) AND (c = 'x'::text))
//        ->  Seq Scan on _hyper_1_2_chunk  (cost=0.00..53.60 rows=1459 width=59) (actual time=0.005..0.558 rows=1458 loops=1)
//        Filter: ((timestamp > '2024-04-01 11:19:52.270045+10:30'::timestamp with time zone) AND (timestamp < '2024-04-17 11:19:52.270045+09:30'::timestamp with time zone) AND (account_id = '1a1473d4-129a-4d0e-bcd6-d0760d8182e5'::uuid) AND (c = 'x'::text))
//        Rows Removed by Filter: 222
//        Planning Time: 2.147 ms
//        Execution Time: 7.077 ms
//        (20 rows)