CloudQuery on using PostgreSQL for cloud assets visibility
This is an installment of our “Community Member Spotlight” series, where we invite our customers to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.
In this edition, Ron Eliahu, CTO and co-founder at CloudQuery, joins us to share how they transform data about cloud assets into PostgreSQL tables to give developers visibility into the health of cloud infrastructure. Thanks to TimescaleDB, CloudQuery users manage their data more transparently while maintaining scalability.
CloudQuery is an open-source cloud asset inventory powered by SQL. CloudQuery extracts, transforms, and loads cloud assets into normalized PostgreSQL tables, enabling developers to assess, audit, and monitor the configurations of their cloud assets.
Cloud asset inventory is a key component to solve various challenges in the cloud:
- Cloud Infrastructure Monitoring, Visibility, and Search: Give developers, SREs, DevOps, and security engineers a streamlined way to gain visibility and perform a wide range of tasks. These tasks include security analytics, fleet management auditing, governance, and cost.
- Security & Compliance: Turn security and compliance tasks to data problems and solve them with the best tools and practices in DataOps. Use CloudQuery Policies to codify, version control, and automate security and compliance rules with SQL.
- Infrastructure as Code (IaC) Drift Detection: CloudQuery leverages its asset inventory to quickly detect drift against IaC (Terraform, more to come) which you can run both in the CI and locally.
About the team
We started CloudQuery a year ago, to solve the cloud asset visibility problem, and quickly gained traction. We are currently a small but mighty team of open-source and cloud security enthusiasts, spread all around the world!
A little about myself Ron Eliahu, I am the CTO and co-founder at CloudQuery, I am an engineer at heart, I love building open source projects and working with anything database-related.
About the project
Queryable cloud asset inventory is key in solving a lot of core challenges in the cloud such as security, compliance, search, cost, and IaC drift detection. That is why we started CloudQuery and followed a few key decisions:
- PostgreSQL - The most used database in the world with a huge ecosystem of business intelligence and visualization tools.
- Open-source - To cover a huge amount of API and cloud providers we decided to make this open-source where everyone can contribute without being blocked by a vendor.
- Pluggable Architecture with CloudQuery SDK - Writing plugins, extracting configuration data, transforming it, and loading it to PostgreSQL requires a lot of boilerplate code. To scale this and improve the developer’s experience both internally and externally we released CloudQuery SDK.
Another key observation and requirement that we saw early on for a cloud asset inventory is the ability to not only query the current state but also go back in time. This is super useful for tasks such as forensics, post-mortems, compliance, and more.
This feature required us to maintain historical snapshots in PostgreSQL and we started to look out for a solution, which was quite a journey for us.
Choosing (and using!) TimescaleDB
First attempt: PostgreSQL Partitioning
With some good experience of PostgreSQL under our belt, the first thing we tried is PostgreSQL partitioning Pretty quickly it turned out to be not as easy as expected, hard to maintain and manage, lacking easily creatable retention policies, and bucketing queries. Given our philosophy is to integrate with best-in-class tools and focus our development efforts on our core business use-cases we started looking for an alternative solution.
✨ Editor’s Note: For more comparisons and benchmarks, see how TimescaleDB compares to InfluxDB, MongoDB, AWS Timestream, vanilla PostgreSQL, and other time-series database alternatives on various vectors, from performance and ecosystem to query language and beyond.
Second attempt: TimescaleDB
Given CloudQuery uses PostgreSQL under the hood, supporting historical snapshots in a scalable way usually involves using partitioning. TimescaleDB´s create_hyperfunctions feature allows us to do just that in a simple and effective way giving our users a transparent and automated way to manage their data while still maintaining scalability.
CloudQuery transforms cloud resources into tables, some of these resources have a complex data structure, and are split into multiple relational tables, some of which are hypertables. In order to retain data integrity, we use foreign key relationships (with
ON DELETE CASCADE) to the root resource table. With these foreign keys in place, if a reference to a cloud resource is deleted (for instance a S3 bucket), the downstream data is removed.
While TimescaleDB hypertables do support using foreign keys that reference a regular PostgreSQL table, hypertable cannot be the source reference of a foreign key In our case, some of our "root" reference tables are hypertables which meant that we had to come up with another way to do the cascading deletes to retain data integrity.
A common alternative is to create trigger functions that will cause a delete on the relation table if a row is deleted in the parent table, the issue here is that some resources in CloudQuery can have three or more levels of relations and we didn’t want to create many queries to solve this, so we came up with the following functions to easily create the deletion cascade.
First, we wanted a trigger function that will delete our relational table. We used trigger arguments to pass the relation table name its foreign key name so we can delete the data in the relation table. Full code is available here.
CREATE OR REPLACE FUNCTION history.cascade_delete() RETURNS trigger LANGUAGE 'plpgsql' COST 100 VOLATILE NOT LEAKPROOF AS $BODY$ BEGIN BEGIN IF (TG_OP = 'DELETE') THEN EXECUTE format('DELETE FROM history.%I where %I = %L AND cq_fetch_date = %L', TG_ARGV, TG_ARGV, OLD.cq_id, OLD.cq_fetch_date); RETURN OLD; END IF; RETURN NULL; -- result is ignored since this is an AFTER trigger END; END; $BODY$;
Then, we call the create trigger function on the root table and pass these arguments to the child. Full code is available here.
CREATE OR REPLACE FUNCTION history.build_trigger(_table_name text, _child_table_name text, _parent_id text) RETURNS integer LANGUAGE 'plpgsql' COST 100 VOLATILE PARALLEL UNSAFE AS $BODY$ BEGIN IF NOT EXISTS ( SELECT 1 FROM pg_trigger WHERE tgname = _child_table_name ) then EXECUTE format( 'CREATE TRIGGER %I BEFORE DELETE ON history.%I FOR EACH ROW EXECUTE PROCEDURE history.cascade_delete(%s, %s)'::text, _child_table_name, _table_name, _child_table_name, _parent_id); return 0; ELSE return 1; END IF; END; $BODY$;
To sum it all up, we built two generic SQL functions to make sure all our hypertables and relational hypertables data get deleted if the root table has any data removed.
Compliance overtime is a common request so we are working on integrating the results of CloudQuery Policies with TimescaleDB so you can monitor and visualize compliance with TimescaleDB and Grafana.
Getting started advice & resources
Before you jump into implementing your own partition strategy, definitely give TimescaleDB a try. It can save you a lot of development time and make your product more robust. The Timescale documentation is a great place to start.
We’d like to thank Ron and all folks at the CloudQuery team for sharing their story, as well as for their work to transform complex and scattered cloud assets data into structured and easily accessible tables enabling developers to monitor their cloud inventory.
We’re always keen to feature new community projects and stories on our blog. If you have a story or project you’d like to share, reach out on Slack (@Lucie Šimečková), and we’ll go from there.