Why you should use a relational database instead of NoSQL for your IoT application
IoT data is complex and needs to be accessed by multiple users, so don’t make the mistake of creating data silos.
In almost every industry, there is a digital transformation underway driven by IoT data. What’s important to recognize is that IoT isn’t about things; it’s about the data those things create and collect. Organizations rely on this data to provide better user experiences, to make smarter business decisions, and ultimately fuel their growth.
However, none of this is possible without a reliable database that is able to handle the massive amounts of data generated by IoT devices. Relational databases are known for being flexible, easy to work with, and mature. What they aren’t particularly known for is scale, which prompted the creation of NoSQL databases. As you may or may not already know, there are ways to overcome this disadvantage.
Another thing to note is that IoT data is time-series in nature. By using a time-series database like TimescaleDB, organizations can leverage the insights hidden in machine generated data to build new features, automate processes, and drive efficiency (more on this later). Often times engineering teams will end up storing data in multiple databases: metadata in a relational database, time-series data in a NoSQL store. Don’t do this.
Take advantage of SQL and its ecosystem
IoT data requires diverse, customizable ingest pipelines that demand a database with a broad ecosystem. To meet these requirements, developers need not look further than SQL.
Relational databases and SQL come hand in hand, and many people (i.e. internal data analysts, application developers, or external users looking to access data in real-time) across organizations often already know SQL. For example, in manufacturing there are teams that may want to monitor for equipment maintenance and predict malfunctions, teams that may want to track productivity and shipping logistics data, and so on. SQL makes it easy for them.
Plus there are some really cool functions such as:
- JOINS: combine rows from two or more tables, based on a related column between them
- Aggregations: where the values of multiple rows are grouped together to form a single summary value (i.e. MIN, MAX, AVG)
- Window functions: operate on a set of rows and return a single value for each row from the underlying query (i.e. PARTITION BY, ORDER BY)
- Common table expressions (CTEs): simplify complex joins and subqueries (i.e. WITH)
- ROLLUPS: an extension of the GROUP BY clause that allows you to use a single query to generate multiple grouping sets
Additionally, developers often want to build applications on top of their existing IoT infrastructure. SQL is compatible with many administration tools, streaming pipelines like Kafka or RabbitMQ, messaging protocols like MQTT, visualization tools like Seeq, industrial automation platforms like Ignition, and extensions for working with geospatial and other datatypes, among many others.
Schemas are a good thing
With relational databases you can use schemas to aid with data modeling. Although it may often seem as if “schema-less” databases make it easier to get started, they result in significant technical debt down the road. Users often have to make design decisions about how their data will be stored up front, and these decisions are hard to change in the future. This means that new querying patterns aren’t well supported if they require different settings to be performant.
On the other hand, building schemas upfront with SQL actually enables complex queries. Users can also adjust and update their schemas using a collection of DDL (data definition language) commands. However, it is important to model data correctly to improve performance. Creating the appropriate indexes and table schema for a given workload can result in significant performance improvements. Conversely, designing the wrong schema can result in significant performance degradation.
Essentially what you want is a flexible schema, particularly when storing semi-structured data (e.g., storing readings from IoT sensors collecting varying measurements). You also want a database that can manage and access data flexibly. Particularly within IoT, devices that you collect data from are not always online, resulting in out of order data that is uploaded in batches. You also might want to update incorrect sensor measurements. All of these functions are well supported by the relational model.
Eliminate data silos
We’ve already touched on the fact that many people throughout an organization already know SQL which allows the data to be accessed by multiple users. From our customers, we often hear that they want a time-series database combined with a full relational system, and they want to be able to join this data.
Fortunately, relational databases support JOINs and eliminate the need to store data in multiple locations. By doing so, organizations also save the overhead costs of operating multiple systems. Additionally, they can avoid integrity issues associated with maintaining separate databases, which leads us to our next point.
Count on relational databases for reliability
Many organizations storing sensitive data rely on relational databases to keep their information safe. After all, relational databases have been around since the 70s, and have a proven track record of keeping Fortune 500 companies data safe.
IoT applications often have to handle a lot of complex queries and transactions. With a relational database, you can ensure these transactions will be processes reliability thanks to ACID (Atomicity, Consistency, Isolation, Durability). If you aren’t familiar, ACID is a set of properties that are used when modifying a database. They guarantee that transactions are valid even in the event that you may encounter an error, power failure, crash, etc.
Choose PostgreSQL + TimescaleDB for IoT
If you are looking for a relational database for IoT, we suggest that you choose PostgreSQL. While we may seem partial, PostgreSQL’s popularity remains steadfast and was named the top DBMS of the year by DB-Engines for the second time in a row:
“First released in 1989, PostgreSQL turns 30 this year and is at the peak of its popularity, showing no signs of aging with a very active community. PostgreSQL has established itself as the preferred data store among a large crowd of developers, loved for its stability and feature set.”
While there are other relational database management systems out there, PostgreSQL plus TimescaleDB offer significant advantages for IoT developers.
We encourage active TimescaleDB users to join our 2,000+ member-strong Slack community and post any questions you may have there. If you are looking for enterprise-grade support and assistance, please let us know.
Interested in learning more? Follow us on Twitter or sign up below to receive more posts like this!