“Timescale Cloud is very user-friendly in the sense that it is easy to comprehend and implement — and it’s affordable! More importantly, it is a scalable long-term approach suited to fit our use case.”— Kshitij Purwar, Blue Sky Analytics, CTO
Blue Sky Analytics is building the “Bloomberg for Environmental Data” to drive sustainable decision making and safeguard the global market from climate-change-induced threats. To do this, Blue Sky Analytics collects geospatial and time-series data to create continuous, high-resolution data sets to monitor pollution levels, water quality, emissions, fires, changes in soil composition, and more.
These types of workloads tend to accumulate very quickly and can cause complications when working with a traditional relational database. Since they’ve added Timescale Cloud to their technology stack, the Blue Sky Analytics team spends less time on database management, and more time analyzing their high-volume datasets.
Previously, Blue Sky Analytics used MySQL for time-series data management – but this required them to use an ad-hoc approach (i.e., null filling using multiple cron jobs) to query data and became increasingly difficult as they grew.
Blue Sky Analytics needed a solution that provided advanced analytics, in addition to a scalable architecture, SQL support, and geospatial data capabilities. The team also wanted to get up and running quickly (they’re a fast-moving startup - growing from 3 to 15 members in just 9 months - and they were keen to find something with minimal learning curve that they could spin up and scale fast). All of these requirements led them to choose Timescale Cloud, the fully-hosted and managed version of TimescaleDB.
Now, the Blue Sky Analytics team focuses on deriving insights from their time-series data, not worrying about maintenance. They utilize several Timescale Cloud built-in functions, such as gap filling, time bucketing, and continuous aggregations, that perform the “heavy lifting” on the management side.
At Blue Sky Analytics, they collect an enormous amount of data - terabytes of raw data from satellites in orbit and ground monitors - to churn out environment intelligence. For example, they have 1000+ ground monitors across India and parse data from Sentinel 5P, Aqua, and Terra satellite missions.
Due to various environmental and other factors, Blue Sky Analytics’ incoming ground and satellite data is often inconsistent and missing data points. They use built-in Timescale Cloud features to solve these data challenges and compile a smooth trend line that can be used for future analysis.
Additionally, since Timescale Cloud works well with PostGIS (a Postgres extension specifically designed to handle geospatial data), the Blue Sky Analytics team can create geospatial JOINs, perform time slicing, and implement null filling more effectively.
Blue Sky Analytics shared one of their complex - and powerful - geospatial queries that illustrates the power of PostGIS & Timescale Cloud.
SELECT name, -- Aggregates all the points in one district json_agg(json_build_object('datetime', datetime, 'u_wind', u_wind, 'v_wind', v_wind, 'albedo', albedo, 'aod469', aod469, 'aod550', FROM ( -- Selects all the districts in Punjab SELECT name, shape FROM "shapes" WHERE TYPE = 'District' -- Selects all the `Districts` in `Punjab` state AND ST_WITHIN(shape, ( SELECT shape FROM "shapes" WHERE name = 'Punjab'))) AS Districts LEFT JOIN ( -- Select all the points in last week with daily average and within Punjab SELECT time_bucket_gapfill ('1 day', recorded_at, NOW() - interval '1 week', NOW()) AS datetime, grid, avg(u_wind) AS u_wind, avg(v_w FROM "measurements" WHERE -- Get the points within 1 week recorded_at < NOW() AND recorded_at > NOW() - interval '1 week' -- Get the point only in Punjab AND ST_WITHIN(grid, ( SELECT shape FROM "shapes" WHERE name = 'Punjab')) GROUP BY grid, datetime) AS Records -- Join the points based on geometry ON ST_Within(grid, Districts.shape) -- finally group them together GROUP BY name
Blue Sky Analytics’ initial product offering, BreeZo, is a high-resolution platform for monitoring air pollution levels in India (freely available web application). They have big plans to expand the reach of their environment intelligence technology to cover water in 2020, land in 2021, and other environmental factors in the coming years.
All of these new areas will require the Blue Sky Analytics team to handle even more enormous quantities of complex data, and they’ve built their architecture - centered around Timescale Cloud - to scale according to their plans for the future.
With a stable, scalable infrastructure in place, Blue Sky Analytics has a significantly less complicated development process, and is free to focus on accomplishing their master goal: become the go-to-source for environmental data around the world.