Database Management: Behind-the-Scenes Lessons From a Data Architect
I am old. Well, maybe not in earthly years, but definitely in technology years. I remember when we would run applications on mainframe hardware, what Unix really is (or was), countless hours in extremely refrigerated data centers, helping customers edit configuration files over the phone with “vi,” and how difficult it was to help customers run production level software applications.
In some ways, life for technologists has changed dramatically in my professional journey. As an aging technologist (and currently a data architect), I would have never envisioned the compute power, storage capacity, and network throughput available at our fingertips today.
Yet, in other ways, some things always stay the same. To run production-level applications twenty-four seven, database developers need to consider (and address) a series of requirements that are still the same today as they were 30 years ago.
So, let me recap some of the lessons learned over the last three decades that allowed me to hone my database management skills. In this blog post, I’ll go through the four things you should think about when planning to run your database in production:
- Compute resources
Often, developers in our community ask us for advice, wondering if they should invest in a managed database service or build their self-hosted deployment. Our most likely answer is, “It depends” (as with many things in the database world). But, when evaluating both options, try to consider the big picture of everything that goes behind hosting and managing a database in production.
Database Management: Behind the Scenes
Managed databases have made the process of running applications in production considerably simple for the developer. So much so, in fact, that sometimes we tend to overlook all the work behind it. And remember, this database management workload is something you should plan for if you’re considering hosting your database in production.
Let’s take a closer look at our own database platform, Timescale Cloud. Timescale Cloud is a managed cloud-native database platform that supercharges PostgreSQL for time series. The result? Fast queries at scale, cost-efficient scaling options to store higher data volumes for less money, and plenty of time-series functionality to save development time.
But Timescale Cloud is much more than just an optimized time-series database: it gives customers access to an entire database environment where they can sign up and have a database running in a matter of seconds.
Timescale Cloud also automatically takes care of many database operations (such as backups and failover or upgrades), and it greatly simplifies others, like creating copies of your database for testing, enabling high availability via replication, or creating read replicas for load reduction.
To create such a seamless experience, managed databases juggle many elements behind the scenes, including physical facilities, compute hardware, virtualization, an operating system, and the database itself.
I’ve spent plenty of time addressing those things myself throughout my career. Let’s take a journey through time and look at what it entails to effectively provide and manage each of these building blocks.
Hosting Databases in Production: Building Blocks
For most customers, the amount of space needed to run a database is rather insignificant. It is everything else around the actual footprint that is difficult to provide for twenty-four-seven applications. In one of my previous jobs, I assisted customers in installing appliance hardware. I spent countless hours in surprisingly cold data centers assisting customers with installation services. Spending time in an actual data center highlights elements needed for proper facilities: cooling, uninterrupted power, and physical security.
Dealing with hardware is now a pastime for most young technologists in the software industry. Long gone are the days we had to first figure out our facilities and then hardware. This always entailed numerous problems for companies.
For example, by its very nature, computer hardware is, at best, a depreciating asset. When we used to have to procure and install our own hardware, it was essentially obsolete on the day of installation. Rarely is there a physical item on this Earth with such a short shelf life! I remember the days spending hours purchasing, physically installing, running cables, and manually updating BIOSes… To replace it soon thereafter.
Today, we can use a cloud service (AWS, for example) to allocate compute resources dynamically. This has greatly simplified the task of deploying the hardware needed to run applications, making things smoother for developers and companies. In Timescale, we’re actually running Timescale Cloud in AWS ourselves, taking advantage of the flexibility of this cloud ecosystem to build our platform.
What we learned from our own experience is that having access to a cloud service is just the tip of the iceberg when dealing with hardware. One can hastily miscalculate the effort that goes into it. Sure, it is pretty straightforward to instantiate an Amazon EC2 instance. But do you have the time and flexibility to provide an easy front end to instantiate and change resources on the fly? Can you monitor those resources for consistent uptime and performance? And ultimately, can you automatically adjust when the resources are not performing as one expects twenty-four by seven? If reading this paragraph didn’t exhaust you, maybe you can.
Deploying your hardware in the cloud doesn’t necessarily simplify many critical database management tasks you will need to perform daily if you’re running a database in production. This aspect often confuses our users, who expect these tasks to be less time-consuming, especially if they’re used to working with managed database services.
So while a cloud service costs a bit, it is nothing compared to the costs if a customer were to build that functionality for themselves. I am happy to no longer deal with the entire facilities and hardware pieces of the puzzle.
Speaking of operating a cloud infrastructure: let’s talk about virtualization.
Virtualization was introduced about midway through my career in technology. I remember quite well how it made certain tasks much easier. For example, I could virtualize the hardware in a multi-node software application, so I didn’t need a full hardware rack but rather a much smaller footprint. It saved physical space and time.
But it came at a cost since nothing is free. There is now a layer of complexity that, if not properly managed, can greatly impact all the layers above, including how the operating systems (O/S) and applications run. Too often, I’ve seen virtualization misused (e.g., purposely overprovisioned), negatively affecting everything else that runs above the virtualization layer. However, virtualization has come a long way in a few short years to where it now affords us the ability to run entire data centers as virtualization centers.
Operating systems are funny things. They are not seen yet are crucial to our life. Even our phones now have an operating system. I’ve seen many different “server” operating systems in my life, from IBM OS/360 to VAX/VMS to just about every Unix variant (SunOS, Solaris, System V), and now Linux and its many distributions. Not many people think about operating systems except when things go bad.
But operating systems are another thing that needs care and feeding, so they need to be on someone’s mind. Benjamin Franklin’s quote that the only thing certain in life is death and taxes needs to include software and security patches. If you’re thinking about self-hosting your own database, this is another aspect to consider: make sure to remember that you’ll need to monitor and maintain your O/S, running upgrades every time it’s necessary to patch a security vulnerability or an important bug.
We now come to the last component—the database itself. Timescale Cloud is an extension of Postgres. Just like an O/S, the database needs care and feeding. Timescale Cloud takes care of database maintenance like every other layer before, including bug and security patches.
But database management in production involves another crucial element: preparing for when things go wrong. That is the only certainty of the production process—unfortunately, things will go wrong sometimes, and you need a plan.
If you’re self-hosting your database, you must define a set of operative rules to determine what to do when your storage corrupts, or your compute fails. At a minimum, you must take backups regularly, testing them to ensure your data is safe.
But take into account that recovery from a backup can be a rather slow process. You may want to consider setting up an alternative method (e.g., replication) that avoids the potential downtime due to a database failure. Designing for less downtime in production is often a vital element to consider as, more often than not, having your database down will cause the whole business to collapse. Besides the money loss, this is a terrible experience for your end-users.
Self-Hosting or Managed?
Reminiscing on the past helps to understand all of the behind-the-scenes elements that are at play when managing a production database.
- By choosing to run on a managed platform like Timescale Cloud, you don’t have to worry about the cost and hassles of managing facilities and hardware provisioning. You can deploy databases in one click and scale them as you need. Once you don’t need them anymore, you can delete them and not think about the underlying hardware that powered them ever again.
- Timescale Cloud makes the most of virtualization technology to provide properly allocated resources to end-users. Timescale Cloud customers don’t need to worry about instances since Timescale Cloud does not overprovision compute resources. In addition, the service carefully monitors every virtual machine to ensure each customer has the requested resources.
- Timescale Cloud takes care of maintaining and monitoring the O/S. Timescale Cloud is constantly monitoring to ensure the O/S is running correctly, not to mention the never-ending bug and security patches. As an administrator, I’d be happy never to maintain a server O/S again, which is exactly what Timescale Cloud provides.
- Talking about things I’d be happy never to do again: worrying about backups. Timescale Cloud automatically keeps up-to-date backups by performing full backups weekly and partial backups daily. We stressed about backups because we weren’t good at consistently performing them, let alone testing them to ensure they were ready if they were ever to be used.
- Besides backups, Timescale Cloud keeps Postgres write-ahead log (WAL) files of any changes made to the database. This ensures the recovery of a database at (and to) any point in time without experiencing data loss.
- Finally, another huge advantage of managed databases (like Timescale Cloud) is that you can easily enable database replication for high availability. This means that when things go south and the production database goes down, another instance automatically spins up in seconds instead of leaving your users down for what could turn into hours. Nobody wants to be called to solve a complicated mess in the middle of the night.
But that’s not even it. The dynamic cloud-native architecture of Timescale Cloud allows us to provide our customers with many more services that simplify your daily tasks, so you can focus on your applications instead of operating your database.
For example, Timescale Cloud offers a single-button “fork” mechanism. This single-button action duplicates an entire environment at the push of a button (hardware resources, O/S, database, application code, and data).
Not so long ago, we had to duplicate the entire facilities-to-application stack manually. Even if we had the components readily available, it still took hours, if not days. Duplicating an entire environment with the click of a button is precisely what application engineers need to focus on critical tasks. They should spend their time writing and testing applications, not on the platform- or system-level tasks involved in creating and testing their applications.
And since it is so simple to fork the database, you can also delete that instance—so you will only pay for the resources needed for that task. In other words, one can go through an entire quality assurance cycle and spend only pennies.
The Future Is Cloud-First
I guess I am old in a way. But going through these experiences helped me understand the great value of Timescale Cloud. When I work with customers, I no longer need to freeze in a data center helping a customer install yet another rack-mounted piece of hardware. Timescale Cloud customers no longer need to buy hardware that essentially becomes obsolete the day it is bought and installed.
Long gone are the days of waiting for new hardware to increase the capacity of a system since virtualization allows for dynamic resource allocation. And can I say how much I don’t miss keeping up on O/S and security patches?
Lastly, what I hope Timescale Cloud customers will appreciate, just as much as everything else, is the expertise available to help them from a worldwide Support team. I started my career in technology as a support engineer. This was well before the modern support mentality—we actually spoke with customers either on the phone or via personal email interaction.
Timescale Cloud Support is fully staffed, offering the same high-touch support. They’re ready to help on a myriad of topics, such as data migration, schema design, data modeling, query or ingest performance, compression settings, and more, providing in-depth consultative support at no additional charge.
Many times we simply forget all that goes into providing a twenty-four-seven database platform. My hope is Timescale Cloud customers occasionally take a moment to think about our technology journey and appreciate that they can concentrate on their applications and don’t need to worry about how Timescale Cloud has learned from the past to provide a database platform for the future.
Embark on this journey and start prioritizing your applications: sign up for Timescale Cloud. It is free for 30 days, no credit card required.