A success story:Blue Sky Analytics

Blue Sky Analytics is a big data and AI company that uses geospatial data to monitor environmental conditions. They’re on a mission to be the go-to source for environmental data, starting by creating a better monitoring and climate risk assessment platform for various stakeholders across the globe. The Blue Sky Analytics team selected Timescale Cloud to help them accomplish their goals, tapping into capabilities like advanced analytics, support for SQL, a scalable architecture, and geospatial data integrations.

UserKshitij Purwar
TitleChief Technology Officer
CompanyBlue Sky Analytics
IndustryEnvironmental Monitoring & Climate Risk Assessment
Use casesGeospatial Data Intelligence

“Timescale Cloud is very user-friendly in the sense that it is easy to comprehend and implement — and it’s affordable! More importantly, it is a scalable long-term approach suited to fit our use case.”

— Kshitij Purwar, Blue Sky Analytics, CTO

Why Blue Sky Analytics Needs Timescale Cloud

Blue Sky Analytics is building the “Bloomberg for Environmental Data” to drive sustainable decision making and safeguard the global market from climate-change-induced threats. To do this, Blue Sky Analytics collects geospatial and time-series data to create continuous, high-resolution data sets to monitor pollution levels, water quality, emissions, fires, changes in soil composition, and more.

These types of workloads tend to accumulate very quickly and can cause complications when working with a traditional relational database. Since they’ve added Timescale Cloud to their technology stack, the Blue Sky Analytics team spends less time on database management, and more time analyzing their high-volume datasets.

Prioritizing data analytics

Previously, Blue Sky Analytics used MySQL for time-series data management – but this required them to use an ad-hoc approach (i.e., null filling using multiple cron jobs) to query data and became increasingly difficult as they grew.

Blue Sky Analytics needed a solution that provided advanced analytics, in addition to a scalable architecture, SQL support, and geospatial data capabilities. The team also wanted to get up and running quickly (they’re a fast-moving startup - growing from 3 to 15 members in just 9 months - and they were keen to find something with minimal learning curve that they could spin up and scale fast). All of these requirements led them to choose Timescale Cloud, the fully-hosted and managed version of TimescaleDB.

Now, the Blue Sky Analytics team focuses on deriving insights from their time-series data, not worrying about maintenance. They utilize several Timescale Cloud built-in functions, such as gap filling, time bucketing, and continuous aggregations, that perform the “heavy lifting” on the management side.

“Timescale Cloud has been greatly beneficial to us and accelerated our development process.

Timescale Cloud has been greatly beneficial to us and accelerated our development process. For us, the greatest advantage of using Timescale Cloud is that it enables us to do spatial-temporal queries with ease, which saves us a huge amount of time when we analyze data. No more reading 100s of files or writing custom scripts for each analysis – we just dump it into Timescale Cloud and query using SQL.”

— Kshitij Purwar, Blue Sky Analytics, CTO

Finding the value in JOINs

At Blue Sky Analytics, they collect an enormous amount of data - terabytes of raw data from satellites in orbit and ground monitors - to churn out environment intelligence. For example, they have 1000+ ground monitors across India and parse data from Sentinel 5P, Aqua, and Terra satellite missions.

Due to various environmental and other factors, Blue Sky Analytics’ incoming ground and satellite data is often inconsistent and missing data points. They use built-in Timescale Cloud features to solve these data challenges and compile a smooth trend line that can be used for future analysis.

Additionally, since Timescale Cloud works well with PostGIS (a Postgres extension specifically designed to handle geospatial data), the Blue Sky Analytics team can create geospatial JOINs, perform time slicing, and implement null filling more effectively.

Blue Sky Analytics shared one of their complex - and powerful - geospatial queries that illustrates the power of PostGIS & Timescale Cloud.


SELECT
  name,
  -- Aggregates all the points in one district
  json_agg(json_build_object('datetime', datetime, 'u_wind', u_wind, 'v_wind', v_wind, 'albedo', albedo, 'aod469', aod469, 'aod550',
FROM (
  -- Selects all the districts in Punjab
  SELECT
    name,
    shape
  FROM
    "shapes"
  WHERE
    TYPE = 'District'
    -- Selects all the `Districts` in `Punjab` state
    AND ST_WITHIN(shape, (
      SELECT
        shape FROM "shapes"
      WHERE
        name = 'Punjab'))) AS Districts
  LEFT JOIN (
    -- Select all the points in last week with daily average and within Punjab
    SELECT
    time_bucket_gapfill ('1 day', recorded_at, NOW() - interval '1 week', NOW()) AS datetime, grid, avg(u_wind) AS u_wind, avg(v_w
    FROM
      "measurements"
    WHERE
      -- Get the points within 1 week
      recorded_at < NOW()
      AND recorded_at > NOW() - interval '1 week'
      -- Get the point only in Punjab
      AND ST_WITHIN(grid, (
        SELECT
          shape FROM "shapes"
        WHERE
          name = 'Punjab'))
    GROUP BY
      grid, datetime) AS Records
    -- Join the points based on geometry
    ON ST_Within(grid, Districts.shape)
    -- finally group them together
  GROUP BY
    name

Building a scalable architecture

Blue Sky Analytics’ initial product offering, BreeZo, is a high-resolution platform for monitoring air pollution levels in India (freely available web application). They have big plans to expand the reach of their environment intelligence technology to cover water in 2020, land in 2021, and other environmental factors in the coming years.

All of these new areas will require the Blue Sky Analytics team to handle even more enormous quantities of complex data, and they’ve built their architecture - centered around Timescale Cloud - to scale according to their plans for the future.

Overview of Blue Sky Analytics' infrastructure and Timescale Cloud deployment

Blue Sky Analytics uses Node.js at the application layer, ElastiCache to cache expensive queries, and AWS S3 to store raw satellite data. They put AWS Lambda in place to crunch big data into useful insights, storing all resulting data in Timescale Cloud. They combine Timescale Cloud and PostGIS to analyze data.
“Timescale Cloud is a great place to start working with time-series data, and I strongly recommend it to developers.

Timescale Cloud is a great place to start working with time-series data, and I strongly recommend it to developers. With Timescale Cloud, you can adjust the system to meet your needs in terms of scale and flexibility…the possibilities are endless! It is truly a cutting edge technology that has expedited our mission of commanding the space of environmental data.”

— Kshitij Purwar, Blue Sky Analytics, CTO

Blue Skies ahead with Timescale Cloud

With a stable, scalable infrastructure in place, Blue Sky Analytics has a significantly less complicated development process, and is free to focus on accomplishing their master goal: become the go-to-source for environmental data around the world.

Identify with Blue Sky Analytics's use case?
Contact our team to learn more

I'd like to know more