“Timescale Cloud is very user-friendly in the sense that it is easy to comprehend and implement — and it’s affordable! More importantly, it is a scalable long-term approach suited to fit our use case.”
— Kshitij Purwar, Blue Sky Analytics, CTOBlue Sky Analytics is building the “Bloomberg for Environmental Data” to drive sustainable decision making and safeguard the global market from climate-change-induced threats. To do this, Blue Sky Analytics collects geospatial and time-series data to create continuous, high-resolution data sets to monitor pollution levels, water quality, emissions, fires, changes in soil composition, and more.
These types of workloads tend to accumulate very quickly and can cause complications when working with a traditional relational database. Since they’ve added Timescale Cloud to their technology stack, the Blue Sky Analytics team spends less time on database management, and more time analyzing their high-volume datasets.
Previously, Blue Sky Analytics used MySQL for time-series data management – but this required them to use an ad-hoc approach (i.e., null filling using multiple cron jobs) to query data and became increasingly difficult as they grew.
Blue Sky Analytics needed a solution that provided advanced analytics, in addition to a scalable architecture, SQL support, and geospatial data capabilities. The team also wanted to get up and running quickly (they’re a fast-moving startup - growing from 3 to 15 members in just 9 months - and they were keen to find something with minimal learning curve that they could spin up and scale fast). All of these requirements led them to choose Timescale Cloud, the fully-hosted and managed version of TimescaleDB.
Now, the Blue Sky Analytics team focuses on deriving insights from their time-series data, not worrying about maintenance. They utilize several Timescale Cloud built-in functions, such as gap filling, time bucketing, and continuous aggregations, that perform the “heavy lifting” on the management side.
“Timescale Cloud has been greatly beneficial to us and accelerated our development process.
Timescale Cloud has been greatly beneficial to us and accelerated our development process. For us, the greatest advantage of using Timescale Cloud is that it enables us to do spatial-temporal queries with ease, which saves us a huge amount of time when we analyze data. No more reading 100s of files or writing custom scripts for each analysis – we just dump it into Timescale Cloud and query using SQL.”
— Kshitij Purwar, Blue Sky Analytics, CTOAt Blue Sky Analytics, they collect an enormous amount of data - terabytes of raw data from satellites in orbit and ground monitors - to churn out environment intelligence. For example, they have 1000+ ground monitors across India and parse data from Sentinel 5P, Aqua, and Terra satellite missions.
Due to various environmental and other factors, Blue Sky Analytics’ incoming ground and satellite data is often inconsistent and missing data points. They use built-in Timescale Cloud features to solve these data challenges and compile a smooth trend line that can be used for future analysis.
Additionally, since Timescale Cloud works well with PostGIS (a Postgres extension specifically designed to handle geospatial data), the Blue Sky Analytics team can create geospatial JOINs, perform time slicing, and implement null filling more effectively.
Blue Sky Analytics shared one of their complex - and powerful - geospatial queries that illustrates the power of PostGIS & Timescale Cloud.
SELECT
name,
-- Aggregates all the points in one district
json_agg(json_build_object('datetime', datetime, 'u_wind', u_wind, 'v_wind', v_wind, 'albedo', albedo, 'aod469', aod469, 'aod550',
FROM (
-- Selects all the districts in Punjab
SELECT
name,
shape
FROM
"shapes"
WHERE
TYPE = 'District'
-- Selects all the `Districts` in `Punjab` state
AND ST_WITHIN(shape, (
SELECT
shape FROM "shapes"
WHERE
name = 'Punjab'))) AS Districts
LEFT JOIN (
-- Select all the points in last week with daily average and within Punjab
SELECT
time_bucket_gapfill ('1 day', recorded_at, NOW() - interval '1 week', NOW()) AS datetime, grid, avg(u_wind) AS u_wind, avg(v_w
FROM
"measurements"
WHERE
-- Get the points within 1 week
recorded_at < NOW()
AND recorded_at > NOW() - interval '1 week'
-- Get the point only in Punjab
AND ST_WITHIN(grid, (
SELECT
shape FROM "shapes"
WHERE
name = 'Punjab'))
GROUP BY
grid, datetime) AS Records
-- Join the points based on geometry
ON ST_Within(grid, Districts.shape)
-- finally group them together
GROUP BY
name
Blue Sky Analytics’ initial product offering, BreeZo, is a high-resolution platform for monitoring air pollution levels in India (freely available web application). They have big plans to expand the reach of their environment intelligence technology to cover water in 2020, land in 2021, and other environmental factors in the coming years.
All of these new areas will require the Blue Sky Analytics team to handle even more enormous quantities of complex data, and they’ve built their architecture - centered around Timescale Cloud - to scale according to their plans for the future.
“Timescale Cloud is a great place to start working with time-series data, and I strongly recommend it to developers.
Timescale Cloud is a great place to start working with time-series data, and I strongly recommend it to developers. With Timescale Cloud, you can adjust the system to meet your needs in terms of scale and flexibility…the possibilities are endless! It is truly a cutting edge technology that has expedited our mission of commanding the space of environmental data.”
— Kshitij Purwar, Blue Sky Analytics, CTOWith a stable, scalable infrastructure in place, Blue Sky Analytics has a significantly less complicated development process, and is free to focus on accomplishing their master goal: become the go-to-source for environmental data around the world.