Timescale Timeout: A Chat About Time Series for Analytics
Welcome to the first installment of Timescale Timeout, a series of dev-to-dev talks about data-intensive applications, from individual projects to industry technologies, from best moments to lessons learned by cloud-native application developers.
In this video, Timescale’s developer advocate Chris Engelbert chats with Florian Herrengt, co-founder of Nocodelytics, about his analytics platform, main challenges, and data stack options (as a complement, you can read the interview transcription below). Also, make sure not to miss Florian’s guest blog post in our Developer Q&A series to learn more about how Nodecolytics works with TimescaleDB and how they’re reducing query costs using a wide table layout.
Chris Engelbert: Hello, Florian! I'm super excited to have you here for our first episode of… Well, we're not sure what it's called yet! It may be the Timescale Podcast [Editor’s Note: it’s not]. It may be just random people talking. We’ve got to figure that out in the future, but I'm super excited to have you here for whatever the first episode is. So could you maybe just quickly introduce yourself?
Florian Herrengt: Yes, I'm super excited to be on the first episode of the podcast or whatever it is that you're doing. I'm Florian Herrengt; I’ve been a developer for something like 10 years now. I worked for startups, big companies, and a big operator, and at the moment, I'm building Nocodelytics, which is an analytics platform for the site builder Webflow. We are ingesting a lot of data, and that's why it's going to be very interesting to talk about TimescaleDB and how we switched to it recently and used it to scale our platform.
Chris Engelbert: Right? So you mentioned Nocodelytics. Can you go a little bit deeper into what you do? You said Webflow analytics?
Florian Herrengt: Yeah. So, essentially you sign up with Webflow, you get redirected to some kind of dashboard, and boom, there you have all the data. We collect all the data, and then you can create metrics to say, “Oh, I want to know how many people clicked on this button,” or “I want to know how many people viewed this page.” There's literally nothing for you to do since set up. One of the interesting things is that Webflow has this concept of CMS, so it's a kind of database. You insert items into your table, and then you can render HTML from that, and we try [to instrument] that automatically.
So when people create items in their database, and then you display them on their website, it will automatically be reflected on the metric because we know what you have in the database. We know what people click on, so we can say exactly what item has been clicked, where, on what page, etc.
Chris Engelbert: Right? Okay. So that means you're collecting a lot of clicks and probably page transitions and stuff like that. Is that what you store in TimescaleDB right now?
Florian Herrengt: Yes, it is.
Chris Engelbert: All right. So how did you start? Did you start with TimescaleDB in the beginning, or how did your journey go? How did you end up here?
Florian Herrengt: At the beginning, we actually didn't know if this would work, right? We just built it and see. We tried to move as quickly as possible. We're still out, but at that point, it was really MVP. So I just went with whatever I know best, and it's PostgreSQL. I don't really have much experience with NoSQL or anything like that. So, you know, the classic—boom!— PostgreSQL. I've been working with it for years. Let's use this, and it worked. I mean, it works really well. But as we started to ingest more and more data, it became completely unsustainable, like we literally couldn't handle the queries. Some queries would take 20 minutes. Even if you’re patient, that takes forever. It’s just way too long.
So I was looking for a solution. I looked at everything people were suggesting. There's a fairly long list of things to look at: there were probably three or four that were worth my time at that point. And TimescaleDB was the chosen solution because it's PostgreSQL, basically.
Chris Engelbert: Okay. So the reason you chose TimescaleDB in the end is because it's built on top of PostgreSQL, and you wanted to stay in the ecosystem, or was that just because you said you had the most experience with it?
Florian Herrengt: That was one of the big deciding factors. But to be honest, to begin with, I was reading the documentation of all the different solutions, right? I was trying to make little, small, tiny projects, trying to create a few million rows and see what would happen. And as I was reading the documentation of Timescale, I was actually learning things about PostgreSQL. Like not even related to TimescaleDB. I was literally reading, and I'd be like, “Hey? I could do that with PostgreSQL?” And I would do it. I would see a pretty big performance improvement, and then I was trying to dig into this whole concept of time series and this thing… I think it's called
pg_partman. I have to double-check; I can't exactly remember, but essentially what it does is to create a table for, let's say, every month, for instance, and you'll store that data for the month in that table. So when you're querying for that month, it requires data only from that table. And I was like going down the rabbit hole, thinking, “My god, this is a lot.” This is complicated, you know, and surely someone thought this through already. I'm not the only one dealing with a large amount of time-series data. So let's give it a go. Let's try Timescale, and see what happens.
And it was literally just like dropping [the data]. I didn't change anything. I just spun up a TimescaleDB instance, migrated my data, and it was working. And the queries were taking only a few seconds, like 5-10 seconds instead of 20 minutes, without me doing anything.
Chris Engelbert: Oh, that's why we’re here. That’s the whole point about it; trying to be as unintrusive as possible but making things fast. So you mentioned
pg_partman, which is the automatic partition manager. You can totally do that in PostgreSQL on your own, but let's be blunt, that's no nice game.
pg_partman makes it a little bit easier, but TimescaleDB gives you basically all of that for free, plus all the benefits of PostgreSQL. But did you look into anything else? You said you looked at a few of the options for the documentation. But did you actually try any of the other solutions?
Florian Herrengt: Yeah, we are hosted on AWS, so my go-to was [Amazon] Timestream, which is time series by AWS. But there are a few limitations. For instance, one of the limitations was that you couldn't delete data. It's only
UPSERT. You define a window, and that’s it. But it's very important for us to be GDPR-compliant. So being able to delete data easily is important. If a user goes to this website and decides, “I don’t want to have anything to do with you anymore,” they can delete it [their account] immediately. It’s quite important.
And there was AWS Athena, which I think is probably the closest to what I was looking at. It uses something like parquet files, which are essentially columnar, and compressed. So if you're reading only one column, you'll only get to read that column, and it's compressed, decompressed, and encrypted on the fly. It's SQL; you can literally write SQL data. But the ingestion process is a little bit… There’s a bit of overhead because you need to go through something like Kinesis or Firehose. Then it goes to S3, then from S3, you can read. There are some limitations as well in terms of how many [parquet files] you can query. I was kind of, you know, spending more and more and more time trying to understand the limitations of the product rather than the features. “Can I do this?” or “Can I do that?”
To the point when I was like, “You know what? TimescaleDB really does the job. It works really well. So I might just go with that.” And everything else: we have Grafana dashboards and all these sorts of things that are set up and that rely on PostgreSQL. I didn’t have to change anything.
I looked at something called Influx, but it was the same thing. You know, I have to learn a new tool and all these new things. And there's these features and bugs, all these things I have to do, and also learn a completely new way of doing things in the database. So honestly, TimescaleDB works well. I have time series, I have compression, I can distribute my data—it has everything I need, really, without having to change anything. It's perfect.
Chris Engelbert: Right. I mean, that makes perfect sense. It was kind of the same story with my own startup, where we also looked into other things, and it was like, “Why would I use a new language that I have to learn from the ground up instead of just staying with SQL?” It was kind of the same story. We had PostgreSQL. I loved it for years, and I was basically sold the second I saw TimescaleDB.
But that was really interesting. So you looked into the AWS services, and they had certain limitations, and you found more limitations than you anticipated. So that's really interesting. On the other hand, when you look at TimescaleDB, you said a lot of things literally just worked. It was a drop-in replacement, basically. When you look at TimescaleDB, what would you say is your favorite feature? What is the coolest thing you could tell if you see somebody on the street and that person asked, “What is the cool thing about TimescaleDB? Why did you use it, or why do you love it?” What would you say?
Florian Herrengt: It just works™, you know. It’s one of those things where you don’t have to spend a lot of time. You just learn as you go. You're like, “Oh, I wish I could do that.” There's something called average distinct count, and it's because we have this thing where, when you reach, let's say, 200,000 visitors, it doesn’t matter anymore if you have 1,002 or 1,001. You end up having this very interesting problem. I didn't know, but counting things is actually a very serious business. It’s very hard in PostgreSQL to just do a count. It's very expensive. So TimescaleDB provides you with—okay, boom!—average distinct count. It's there already. And it's really good to be able to just start using it. And then, you know, you have this huge room for growth. And, as I said, so far for me it just works.
All the other things I've tried, it was always a little bit like, “How do I do this? How do I do that? Why am I charged per query and gigabyte? And how much is that going to cost me per month? And how many queries can I handle from one single instance?” All these things were kind of really difficult. The simplicity of TimescaleDB is really what makes it shine.
Chris Engelbert: Right. And it's interesting that you’re going to be charged per query. A lot of the viewers or listeners may be a little bit younger, including me. It was a very common pattern in the mainframe area, where you actually had to pay per query, and people hated it. So they started to self-host, and now we're going all back to the same thing. It's interesting in technology how stuff always comes back to you. I just say microservice and SOAP and whatever, right? We had all of that.
Florian Herrengt: I guess it all depends on how much money you have in the bank and how much you're willing to throw at the problem. As a small bootstrap startup, and you're getting started, we already had like a really big customer that signed up one day, and everything went red. You know, all the alerts just triggered. We had like, I don't exactly remember, a lot of events in the queue, and it was backing up and all these things. I was like, “Oh, okay, let's scale up this and this,” and you can manage it a little bit. If I were on full serverless and paid per query, everything would have been instant, but I would have had to pay, I don't know, like 10 grand at the end of the month. It's good, and it's bad at the same time.
Chris Engelbert: Right? It's a good problem to have because the customer is signed up. It's a bad problem to have if you need to make sure that you still stay inside the estimated cost. I think that is the actual problem.
Florian Herrengt: Yeah, keeping control of your costs is something that is remarkably… If you've only worked for big companies, corporates, or startups with a lot of money, VC-backed funding, etc., it's not really something you think about. For the first five years of my career, I didn't even think about it. I was just like, “Oh, I need this.” Click, click, and it just works, you know? It's only when you actually start paying for it that you realize how much these things cost. And you're like, “Oh, okay, yeah. Maybe I need to do something about this.”
Chris Engelbert: And that is the interesting thing. All the cloud providers make it super easy to just click without thinking about it, right? It doesn't matter if it's AWS or anyone else. They all make it super simple. But since we already talk about AWS, that kind of goes into deployment. We talked a lot about what data and how you store it, or how it can be stored in Timescale, but what does your deployment look like?
Florian Herrengt: The deployment of TimescaleDB?
Chris Engelbert: Yeah, right. How do you deploy TimescaleDB?
Florian Herrengt: Well, in the beginning, it was literally just an EC2 instance. Like old-fashioned. It was the first time I'd done that in my life. I just, you know, spin up an EC2 instance, manually install something, and then connect to it. And it works.
Then you start thinking, “Oh, but what about high availability? What about backup? Read replicas?” So, in the end, we've ended up doing a Kubernetes deployment, which once again, and honestly, I know I'm repeating myself, but it just works. It's impressive. You literally just follow the docs step by step. There are not too many steps. That's it. Boom! You go like TimescaleDB, read replicas. It does some backups. I mean, it's just really good.
Chris Engelbert: All right. Well, I love to hear that. I mean, there is a reason why I joined Timescale. I always had the same feeling. I love the product. And now I'm here, right? But one thing you said is really hilarious to me. You talk about virtual machine setup as old-fashioned. That is super hilarious because we don't have virtual machines for that long. What is it like? Ten, fifteen years, maybe? I mean it is a bit of time, especially in computer technology, I give you that, but old-fashioned? I loved that.
Florian Herrengt: I mean, when I started, my first job was in 2009, and we already had VMware deployment. I was a junior, literally getting started. I had no idea what was going on here. But I remember VMware and deploying things with PHP, just dragging and dropping files onto the FTP server. It kind of had the good thing of being instant. You're just dragging and dropping files. But over time, to be honest, when I really started to build serious software outside of banks, it's always been something like a container. Docker came in and then kind of never left.
Chris Engelbert: Right. So you said right now it's Kubernetes. Did you use the Timescale Helm Chart for the deployment, or did you set it up yourself?
Florian Herrengt: I have set it up myself. The Helm Chart is good, but, as I said, we're very mindful about the money we're spending and all these other things. And the Helm Chart is good, but it’s spinning up way too many things. Keeping control of things is quite important for me. At the moment, at least, you know? So, the Helm Chart—I tried it. It worked really well, and then I was like, “What is this? What is this? I don't know what this is,” and I didn't want to dig into it.
Chris Engelbert: That makes sense.
Florian Herrengt: But, there's no reason not to use it. It was more like, “Okay, I already spent so much time on all these things. I need to move on and start doing other things.” So I just went back to what I knew.
Chris Engelbert: Well, that's fair enough. You may like to hear that there is a lot of stuff coming to the Helm Chart. It wasn't really maintained nicely for quite a while, but there are some plans to optimize it and probably make it available for other cloud vendors.
So for us, it was mostly about we've been on Azure, and the Helm Chart was completely built for AWS. So we forked it out and made some changes ourselves. But I think, as far as I heard, there's quite a substantial number of things coming, especially more configuration options. More people are asking for stuff because more people are deploying it in Kubernetes.
Florian Herrengt: Yeah, I did see that there are a lot of options, but at that point, I was just like, “I need a database. I really need a read-replica and some backup. That's it.” But yeah, it looks pretty cool. And I think if I have some time, I’ll look into it again.
Chris Engelbert: What are you using for backups and managing the replication or failure scenarios?
Florian Herrengt: It's a cron job on Kubernetes that just does the
pg_dump and then uploads it to S3.
Chris Engelbert: Oh, wow, okay. That is basic. Yeah, but it probably just works right.
Florian Herrengt: Yes, it does, and there's no complexity. I like how simple it is. You know, it just runs, does the
pg_dump, compresses it, and then sends it to S3. And then, if I need to roll back or anything like that, I can download it and restore it from my computer. It’s just me.
Chris Engelbert: Oh, that makes sense.
Florian Herrengt: We don’t have 20 engineers trying to roll back the database every two weeks.
Chris Engelbert: Well, I thought you might be using something like pgBackRest or other tools that also give you incremental backups. Sometimes it's nice because it's a little bit easier or less space-consuming.
Florian Herrengt: Exactly. Yes. I’ll have to do that eventually, like incremental backups, to stop paying a lot of money for them. To be fair, I was looking at it, and I think it's one cent per gigabyte or something like that. I thought, “My time is just more valuable than in the storage.”
Chris Engelbert: Makes sense. So wait for the new Helm Chart. It has all of that already built-in. You are probably good at that point in time.
But in general, I'd say that is a really cool setup. I mean, we used k8s, and I know a lot of people that host it in Kubernetes, and it, as you said, it literally just works. But on the other hand, we are talking about Timescale. So have you tried Timescale's cloud offering?
Florian Herrengt: Oh, yes, I have. Actually, that was my go-to, and to be completely honest with you, it didn't quite work for a few reasons. And we don’t know why. I spent quite some time with the Support [Team] trying to understand it. Support is great, by the way, like really good. I think I just didn't have the time, to be honest, to spend a week digging into it. But they were very responsive, replied very quickly, and were very friendly. Everyone was really nice. At that point, you know, you have to get back into the context of having the queue backing up and the queries taking forever. Everything's on fire, so you need to position. But yeah, for some reason, the query latency was really, really high between the EC2 instances and the database. I do not know why. I know about the VPC peering and things like that, but as soon as I started hosting on the same EC2 instances, it was pretty much, you know, like milliseconds or sub-seconds. So I don't know why.
Chris Engelbert: Yeah, well, it could be the data centers. I guess that you guys are probably located in European data centers because of GDPR (Editor's Note: whereas at that time, Timescale only had U.S. data centers). By the way, Timescale now has, I think, Frankfurt as a zone, and we have had VPC peering for a while. So that could be a really interesting thing. There's always a tomorrow, right? Maybe just go ahead and retry it.
Florian Herrengt: Oh, yeah, I haven't tried the VPC peering. I think we were located in the US East. So I don't remember. But yeah.
Chris Engelbert: Well, there is only so fast light can go, or so I heard. I think there was some physicist that claimed that we can't get faster than light. Well, maybe we will manage in the future. So we talked about what data is stored. We talked about how you deploy it. How do you actually design your table? Is it that kind of the same design you used in PostgreSQL before? Or did you have to rethink the design slightly to fit it better into the actual time-series data management?
Florian Herrengt: Yes, I had to change the schema. I don't think it's TimescaleDB-specific, but I did learn it from TimescaleDB.
Chris Engelbert: We love to share.
Florian Herrengt: Exactly. I started the project in a very, very standard way, you know: relationships everywhere, foreign keys, etc. At some point, I had seven
JOINs on a query. And if you try seven
JOINs on a multi-million-row table, it just doesn't work. The cost is through the roof. So I changed to this thing called a wide table model. Wide versus narrow. And the wide table model is essentially just repeating the data. It doesn’t have a foreign key. So one of the examples is the device information: instead of creating a new device in the database and then creating a reference key device id, etc., you store the device information on the row. It dramatically increased the speed of the queries.
And, as we said before, we are obviously going to massively increase the amount of data to store. But as we established before, storage is fairly cheap compared to RAM or CPU or your time. All these things. Also, Timescale does compression. So on top of that, normally, you don't pay much for storage compared to memory, but the compression algorithm, I don't know how it works, it's like magic. But it just compresses everything really well, it's like half of the data. It works really well. So you also may just want to do that, and then the queries become super fast. I think the cost was one hundred thousand something. And now it's like a thousand. So it just massively reduced the cost because I don't have to do these
JOINs on multi-million rows and stuff like that. So you just go from like 30-second queries to one second.
Chris Engelbert: I mean, that is basically what you expect whenever you open a dashboard. You don't want to look at it for like 30 seconds just to get the data graph or the answer you're looking for, right?
Florian Herrengt: Yes, and obviously, the answer to that was so many times: “Why don't you do caching? Why don't you do caching?” Yes, but you're not really solving the problem, you're just hiding it. We have customers with a lot of data, like a lot of data. You know, two hundred thousand users and stuff like that, and we do caching. But for someone else, think about your user experience. You set up the website, and you can kind of see in real time what’s happening on your website and stuff like that when you only have thousands of visitors. And let's be honest, not many people have millions of visitors. So, having this thing where it's kind of like somewhat real time, it’s not exactly real time, but it doesn't take, you know, half an hour for your data to update. It's really nice. I think it’s a good user experience for a dashboard.
Chris Engelbert: Alright! Sounds good to me. Let me see. I think that was all the questions I had. Is there anything you want to share with the PostgreSQL or, specifically, the Timescale community from your site? Anything goes!
Florian Herrengt: What I would say is, give it a try. When I looked at it for the first time, I was very worried about it. Just because I'm very, I guess, risk-averse. I don't like to use shiny new stuff and all the latest takes and stuff like that. I used to. I got burned. Now I'm like, “Let's stick with what we know best.” And it's just PostgreSQL. Like really. It’s just PostgreSQL: if something goes wrong or you don't like it, you can immediately return to PostgreSQL. But the benefits you get, in my opinion, dramatically outweigh the risk of going for a not-so-well-established solution. At least compared to the PostgreSQL database, relatively speaking.
So give you a try. Honestly, it's one of the things where you just go and try it to see if it works, and if it works, then I don't see any reason not to use it.
And there are so many features if you look at the…What’s the name? The Toolkit?
Chris Engelbert: The Timescale Toolkit.
Florian Herrengt: You start looking at it, and you're like, “Oh, my god! There are so many things I could be doing with this.” And this kind of progression, when you start using it and keep digging into the organization, and you’re like, “Oh, wow! It has all these things I could be doing.”
I think it makes it the perfect product for growing startups or products as you learn about it. Because you don't really know what your needs will be when you start the project. You may know that you may need something like this, but then the direction is changing or whatever. Using TimescaleDB allows you—well, it allows me—to be able to react very quickly without having to rethink everything or rebuild everything.
Chris Engelbert: It makes perfect sense. And obviously, there is a great community and an awesome community team. No pun intended.
Florian Herrengt: Yes, it is really good. I mean, the [Timescale] Twitter Community is small but really nice and very responsive. I would also check out the Slack channel. I don't always reply, but I try to reply sometimes. But I’m a big fan of the #tech-design channel, where people ask questions: “Hey, how can I do this? How can I do that?” or “I have this problem. How should I design this?” I find it interesting, and it’s weird; I’m like, “Oh wow, I had no idea you could do that.”
Chris Engelbert: Yes, I totally agree with that. And there are so many different options, and I often answered on that channel, even before I joined, so I don't want to stop that. But even for me, there is a lot of learning new stuff, seeing what other people came up with as an idea or a question, because you never stop learning. There is never the point where you say, “Okay, I know everything.” There is always somebody better and has a much better idea, and I love to find that situation where “Oh, man, that makes sense. Why didn't I come up with that?”
Florian Herrengt: Yeah, that happens too often. You look at it, and it’s like, “Oh, wow!”
Chris Engelbert: In the past, it always happened when you looked at new startups like, “Oh, why wasn't that my idea?” All right, I think that was a perfect closing.
One more thing that I really want to mention: you are working on a blog post that goes way deeper than anything we could talk about today. I'm not sure if it will be up with the recording or if it will be published a few days later, but whenever it's up, we are going to make sure to link to it in the description.
So if the audience wants to learn about Nocodelytics or you and ask questions about the design, and get your share of mind on something. Where can people find you and the company?
Florian Herrengt: Twitter is probably the best. Just tag me on Twitter. I try to post and keep the stuff I’m posting interesting. But yeah, if you hit me up on Twitter, you can ask me any questions about this, and we'll be more than happy to answer any questions you have.
Chris Engelbert: All right. I think we will also share the links in the description so everyone can find them. It's probably easier than just trying to spell it. Again, thank you very much for your time. I loved hearing from you, sharing your experiences, and understanding how you ended up here, and I thank you for everything.
Florian Herrengt: Thank you, Chris. That was great.
Chris Engelbert: You're welcome.
If you’re building applications for a growing product or startup—just like Florian—and want to reduce your query cost while working with a database that works just like PostgreSQL but greatly expands its functionality, try TimescaleDB and self-host for free.
For a completely worry-free cloud experience offering bottomless storage, infinite scalability, and cost efficiency, spin up an instance in 30 seconds on Timescale. You can try it for free for 30 days, no credit card required.