Why we introduced telemetry in the latest release of TimescaleDB
As of version 0.12, TimescaleDB will come with opt-out, non-personally identifiable telemetry and version checking.
Starting with version 0.12, TimescaleDB will collect non-personally identifiable telemetry data for the purposes of better understanding usage. The telemetry service will also check for version updates and notify our users if an update is available. Given our commitment to transparency with our community, we wanted to take this chance to discuss the subject of telemetry head-on and describe our motivations behind doing so. This is particularly important as an open-source company, where we face the challenge of balancing the permissiveness of the Apache license with a need to gain visibility into how our product is being adopted.
Open source is the foundation of who we are
The genesis of TimescaleDB is a testament to how leveraging open source can result in better software, faster. Several years ago, the Timescale team was building a proprietary stack specifically for IoT. The product itself leveraged mostly open-source software and connected different best-of-breed tools to provide a full stack experience. However, the open-source NoSQL time-series database we were using was proving to be a bottleneck. Long story short, we ended up building what we now call TimescaleDB. By leveraging and extending the open-source PostgreSQL, we built a database that was optimized for time-series and ready for mission critical workloads much faster than we would have been able to if we had built the database from scratch.
All this is to say that our entire team is deeply invested in preserving the spirit of open-source software. We want our product to be easy to download, test, adopt, and extend to fit whatever stack best fits our users’ needs. TimescaleDB itself is open source, and we’ve contributed several open-source tools to make it work better with other popular open-source software. Our Grafana PostgreSQL data source connector, Prometheus adapter, and pg_prometheus PostgreSQL extension are all examples of open-source tools that we have built for TimescaleDB as well as PostgreSQL. Our growing community continues to battle-test our product, while helping us innovate by providing valuable product feedback and new use cases.
Why telemetry, and why now
First and foremost, telemetry helps us build a better product that directly meets user needs by giving us the data we need to prioritize which features to build. By developing a data-driven understanding of product usage, we can build product features that are more likely to benefit a larger subset of our users. For example, we could choose to improve the installation process for Linux first if most of our users use Linux. After building a new feature, we can gauge its impact on our user base through usage metrics.
Secondly, we need telemetry in order to understand how we are doing as a business. The reality is that building and improving TimescaleDB is predicated on developing a self-sustaining business model. We need to support a dedicated team of employees who work daily to build features that enable better data analysis, faster speed, and improved data lifecycle management. This means building a product that the open source community adopts, as well as generating enough value that certain users choose to ultimately pay for our product. The telemetry we collect helps us understand whether or not we are moving in the right direction by giving us a more accurate gauge of how usage is trending over time. As we drive business success, we can then funnel those resources into supporting and making our existing product better.
The core tenets of TimescaleDB telemetry
As big users of open-source software ourselves, we are intimately familiar with the user concerns around telemetry. We believe that communicating how and why we collect telemetry will help our users make informed decisions on whether or not to allow Timescale to collect data.
Privacy first — We collect basic metrics around operating system types, database versions, number of hypertables, and relevant extensions. Data collected through our telemetry service is completely anonymous and does not contain personally identifiable information. We do not collect any of the data users store in their databases. We strive to limit the metrics we collect only to those we need to drive decisions relevant to TimescaleDB. In addition to collecting metrics, our telemetry service also performs version checks.
Full transparency — We clearly message telemetry to our users through a variety of channels. We print messages both in the terminal and in the logs when users first create the TimescaleDB extension. We document the metrics we are sending to our server and provide instructions on how to turn telemetry off. We’ve also included a SELECT statement that lets users inspect the data being sent to Timescale. Any changes we make in future versions will be reflected in the same ways.
Optionality — We urge our open-source community to join us in making our product better through telemetry, but also understand that in certain scenarios, keeping telemetry enabled is simply not an option. Telemetry is not a requirement for using TimescaleDB, and you can turn it off by simply setting a configuration variable. We opt our users into telemetry by default in the hopes of encouraging more users to help us collect data. We hope that our transparent documentation will ensure that users who do not want to send telemetry have adequate information with which to turn it off.
A request to our open-source community
As the open-source ecosystem continues to evolve, we fully anticipate evolving with it. However, we will always strive to transparently communicate our decisions to users. Our ask for the TimescaleDB community is to continue engaging with us to improve our product, whether it’s through telemetry or providing product feedback. If you are a member of our community, we hope you continue to help us drive innovation in our product and invite you along with us on our journey towards making the best open-source time-series database in the market.