SQL Isn't That Hard (When You Have a Good Problem to Solve)
David Kohn, Education Engineering Manager at Timescale
How does an electrochemist learn SQL? And how hard was it to learn? Always keen on automating processes to escape repetitive tasks and learning new ways to handle data, David Kohn began professionally tinkering with databases while working for a battery manufacturing startup in New York.
Fast forward a few years later, he met Timescale’s co-founders at a PGConf and became one of the first users of the company’s open-source relational database before joining the startup a few months later.
Currently the engineering education manager at Timescale, a role in which he strives to improve how we teach and learn about Postgres, Timescale, and all things time series for developers inside and outside the company. David is also the go-to person for dessert recipes—a short stint as assistant pastry chef left him with impressive crème brûlée making skills, and while his chopping expertise is not something he often uses for work, his knack for teaching made him the perfect first guest of this new series.
Today, we’re speaking with David Kohn. We hope this series helps you learn about new tools, training, and applications for PostgreSQL.
When did you get into SQL? Can you give us a brief overview of your history with this query language?
I studied Environmental Engineering and ended up doing some research at a battery lab. I founded my own company and got some traction, later joining a battery manufacturing startup. These electrochemists knew much more than I did, but they weren’t computer people. And I had done a little bit of coding when I was an undergrad and took a few classes in computing for engineers.
One of the things I noticed was that they were still downloading Excel files to analyze hundreds of cells. These were experimental cells that we put together to figure out battery formulations that would work better. So there were large battery test setups that would run for a long time and generate a fair amount of data—and we needed a better way to store and analyze that data. And so I started getting interested in databases because of that. I eventually fell into PostgreSQL and later joined an ad analytics startup where I worked with PostgreSQL.
As someone without a computing background, how was your learning curve?
I think it was somewhat hard but not crazily hard. People sometimes struggle with learning something because they don’t have a good problem to work on. I had a problem I was trying to solve, which was how to make that data more accessible.
✨ Editor’s Note: Looking for fun problems to solve? Check out our tutorials for building time-series applications.
During the learning process, I would run into frustrations or things that didn’t do as well as I was hoping to. I think I started in FileMaker because that was what someone at the company knew about. Like I said, not computer folks. We were battery folks. I quickly discovered that that wasn’t great for what I was trying to do.
I looked at MongoDB a little bit, but I think I needed something with data types and stuff, so I then went to MySQL for a little while and eventually switched to PostgreSQL.
“People sometimes struggle with learning something because they don’t have a good problem to work on. I had a problem I was trying to solve, which was how to make that data more accessible”
Why the switch?
In some ways, figuring out which one to use was one of the harder challenges. I think I read some stuff online comparing the two because I was trying to make a decision. Some people were arguing for PostgreSQL, and I liked their arguments. There were elements like strong typing and various other things that made sense from a developer's perspective, including a security standpoint. And it seemed like there were people who were passionate about it, there was a strong sense of community. Also, MySQL was in a little bit of a decline at that point: they’d been bought by Sun [Microsystems] and then Oracle and the community around it wasn’t as strong.
JSON was another factor. This was around 2014-15, and I was thinking about using some JSON at the time—PostgreSQL had JSON support that MySQL didn’t have. I knew I would be ingesting some things from different formats, so there were various aspects that led me to that decision.
Backing up a little, how did you learn SQL? Did you teach yourself or take a course?
One of the things that definitely drew me to PostgreSQL was the documentation. So I mostly taught myself by reading the documentation, watching some videos, and trying to solve my problems. The PostgreSQL documentation is pretty well written, and if you’re willing to spend some time studying it a bit, you can get pretty far.
And for a long time, I understood how PostgreSQL worked at a very high level, but I knew very little of the actual code. I still don’t know the PostgreSQL codebase that well. Another thing that was foundational to me was taking some courses on relational database theory: what the relational model is, what transactionality means, why it’s important, all of that stuff. Some of the stuff I covered in my first two lessons of the Foundations [of PostgreSQL and TimescaleDB] course.
Bruce Momjian’s videos on the internals of PostgreSQL were incredibly helpful for understanding what’s going on. Everything makes more sense. Once you understand just a little bit of what's going on behind the scenes—you don’t need to understand the exact mechanics, just that some of those things are happening—everything becomes easier. Bruce does a great job; he has some lovely explanatory videos about different index types and how the parser, planner, and executor interact, among other things. His teaching background shows, and he was definitely a huge help when I was learning things.
How long did it take you to actually be proficient in SQL?
So, three to six months on FileMaker and MySQL. I found PostgreSQL probably three to six months in, and then another six to eight months, I’d say, and I was feeling reasonably proficient.
Tell us more about how you have used SQL throughout your career.
I’ve often used PostgreSQL as a tool for other things, which I think is so underrated. I guess I still bring up sort of an odd perspective. Maybe I think a little bit about databases like I’m a chemist, right?
For instance, one of the things that we did at Moat (the ad analytics company) was an app written in PHP that would access our databases. We had various backends storing different parts of the longer-term stores, real-time data, and other things. It would call out to them and then get responses and try to stitch them together in memory, which meant that, to an extent, we were writing joins and other sorts of stuff in PHP.
And it turns out that’s a bad idea. PHP is not optimized for that, and also, when you’re writing your own joins, they’re not likely better than the optimized C code that many Computer Science Ph.Ds. have worked on for 30 years. We were running into all sorts of problems with this, especially as we got larger periods of data and more clients. Looking back, what was that solution doing? It was trying to allocate more memory than had ever been in the history of computing at the time—of course something would go horribly wrong. There is no way a computer can have that much memory.
“As far as the tools that I use, I love Postico, which is Mac OS only, but it’s a lovely database explorer, almost like a graphic user interface (GUI) for a database”
So, what we did was write SQL queries to redo that. We had these ephemeral PostgreSQL instances that had no data on them. There was no actual data that they were storing. We’re just using them as a computation and join engine. And we would essentially take data from our other databases and put them into like temporary tables there and then perform the joins like that, serve the results through PHP, and it was great! All of those out-of-memory errors were gone. Those databases were incredibly easy to run: you could quickly scale them up and down because there was no data on them—it was freeing. We even came up with some interesting ways to do debugging. So this is one of the things that I do, and that is underutilized: figuring out how you can use PostgreSQL to make your life easier on the front end. If you manage to use it with your API, it can be a powerful tool.
I think that one of the things about PostgreSQL that people are beginning to realize is that it’s incredibly versatile. This is part of what happens when you’ve been around for 30 years, and lots of people contribute to it. There are various use cases PostgreSQL is good at, but there are still creative ways to use it, and maybe the people who came up with it didn’t consider it, right?
What are your frustrations with PostgreSQL and, on the other hand, your favorite tools?
MySQL had a really great tool; I think it was called MySQL Workbench, and there’s never been anything like that for PostgreSQL. From the user perspective, in terms of doing migrations and unified modeling language (UML) diagrams and using that to then update a schema, it was a really awesome tool.
As far as the tools that I use, I love Postico, which is Mac OS only, but it’s a lovely database explorer, almost like a graphic user interface (GUI) for a database. I use psql, which a lot of the more hardcore PostgreSQL folks use. I prefer the GUI stuff. I’ve always been more visual—I think a lot of information can be conferred visually. It just needs to be designed well. And some of the PostgreSQL tools have never had a great design, no offense. I haven’t tried pgModeler in a while, and I want to do that. It’s a tool that helps do those sorts of UML diagrams and syncing for schema design, which I like.
Another tool I use all the time is explain.depesz.com, which is a tool for helping you read an EXPLAIN plan. It’s still bookmarked, and I think it’s one of the better ones. EXPLAIN is a sort of built-in tool in PostgreSQL that’s incredibly useful. For understanding what's going on in a query, pg_stat_statements. So these are some of the things I use.
You mentioned before that you thought PostgreSQL had a great community. Were there particular folks who contributed to that sense for you?
Oh man, so many! I think at the beginning would be Bruce Momjian, whose talks/videos I mentioned before. He made things available and accessible, and I eventually also got to meet him at conferences, etc. Then there were folks like Corey Huinker and Jim Nasby who were really encouraging of crazy ideas as I was learning more. And folks like Melanie Plageman, who was sort of entering the community around the same time as I was, and she has been going on to really thrive. But it’s so great to see other folks joining and getting to know the community a bit. And that’s just a small selection of people who I connected with particularly; there are so many others who were pretty universally kind and encouraging and really helpful as I’ve grown (Heikki Linnakangas, David Fetter, Robert Treat, Andres Freund, Peter Eisentraut, Thomas Munro, Peter Geoghegan, Robert Haas, Dave Cramer, Tomas Vondra, and so many more!). Sometimes folks can seem a little standoffish at first, but when you get to know them, they’re really lovely. And by the way, if you’ve got any feedback about our courses or blog posts, please reach out!
Take the State of PostgreSQL 2022 survey! It is open until June 30 at 11:59 p.m. PT.