How (and why) to become a PostgreSQL contributor
Contributing to open-source projects can be intimidating – and PostgreSQL is no exception. As a long-time PostgreSQL contributor, Aleks shares his hard-earned tips to help you make your first contribution, or start contributing more.
PostgreSQL is one of the most popular and loved databases in the world. It’s no secret that we are big fans of PostgreSQL at Timescale: We’ve built TimescaleDB on top of it, we employ open-source PostgreSQL contributors (like me!), and we’ve developed features to make using PostgreSQL better for time-series scenarios (like Skip Scan, which makes certain queries in PostgreSQL 8000x faster). But, in addition to helping improve the database itself, we’re committed to the success of the PostgreSQL community at large.
Open-source is not just my passion; it’s my career. I’ve been a PostgreSQL contributor since 2016, and recently joined Timescale as a full-time open-source PostgreSQL contributor. I’ve contributed not only to PostgreSQL but also to Insolar, Sigrok, and other open-source projects. I’m the author of pg_protobuf and ZSON extensions for PostgreSQL and several open-source libraries for STM32 microcontrollers.
I love open-source because it enables us to see what’s inside the software, learn from it, and improve it. The quality standards are higher in open-source software than in proprietary software because you can’t hide any cut corners. Last but not least, open-source software can’t refuse to sell or prolong your license because of geopolitical events or whatnot. (I encountered this at least twice in my career.)
Which brings me to the impetus for this post. Earlier this year, we ran the “State of PostgreSQL” survey to learn how people use PostgreSQL, from their community experiences to popular tools and areas to improve.
You can see the State of PostgreSQL 2021 report to explore all findings and trends – but one result stood out for me:
85% of respondents haven’t contributed to PostgreSQL codebase, docs, or commitfests, and only 4% have contributed several times.
The survey also highlighted several places where we, as a PostgreSQL community, can be more welcoming to new developers to help them use and contribute to PostgreSQL.
For example, one respondent said: “First code contributions can be traumatic...sometimes we’re not very welcome [sic] with new developers. We should improve...”
That got me thinking about how we can make it easier for folks to overcome the initial fear and other barriers - be it technical difficulty, confusing processes, or lack of information - that often surround contributing to an open-source project. After all, we want more people to be a part of the PostgreSQL community and to make contributions; that’s how we make it even better.
To help more people get started, I wanted to share my observations, what I’ve learned over the past 5+ years, what I wish I knew when I started, and advice I typically give new contributors.
In my experience, little depends on the specifics of the project. So, while I’ll use PostgreSQL-specific examples, the following guidance is quite universal, whether you want to contribute to PostgreSQL for the first (or second, or third) time – or have another open-source project in mind.
I also included a few ways to give back to the community or help a project grow beyond code contributions: the important, yet easily overlooked, elements of building a sustainable, healthy open-source community.
Step #1: Identify your motivation
One of the most important questions to ask is: “Why do you want to be an open-source contributor?” Unless you recognize and understand your motivation, it will be difficult for you to find time for the project, especially as time goes on.
Here is a list of potential reasons why you might want to start working on an open-source project:
To gain a unique experience: If you're a backend developer who's been writing microservices for a while, you might look for a new challenge. Open-source software presents many (many!) such challenges and new technologies to learn.
To learn the internals of your favorite operating system/ database/ language/ compiler: Understanding the internals of your favorite open-source project allows you to use it more efficiently and to learn its limitations. As an example, not many users know that running SELECT queries may cause writes to the disk by PostgreSQL. Or that creating multiple temporary tables may significantly affect the performance of the entire database. Or that
synchronous_commit = remote_apply doesn’t actually wait for replicas before committing the transaction. (The transaction is committed instantaneously. The user is just not notified about this, which may cause problems.)
To work with great people: Open-source attracts some of the most talented people from around the world. There is always something you can learn from them, big or small. The original idea of the ZSON extension came from Alexander Korotkov and Teodor Sigaev, both PostgreSQL committers I was lucky to work with. ZSON is now the most popular project I have on GitHub (390+ stars at the moment of writing) – and there is a possibility that it will be shipped with PostgreSQL by default.
To make users happier: Let’s say you contributed several lines of code and, as a result, made PostgreSQL 10% faster in some scenarios. PostgreSQL is used by thousands of companies whose products are used by millions of customers. It’s satisfying to realize that your small patch made all of these users just a little bit happier, even if you might not explicitly hear from them.
To boost your resume: It’s natural to seek a job that better suits you. Several years of contributing to a well-known open-source software will open new doors for you, from more technical experience to connections with various community members who you may wind up working with later.
To be fair, there are probably dozens of other reasons why you might want to start contributing to an open-source project. This list isn’t exhaustive, and each case is unique, but I tried to distill a few of the reasons I see again and again.
Once you’ve taken some time to reflect on and establish why you want to contribute, the next step is to familiarize yourself with the project’s development process.
Step #2: Learn about the development process
Before starting the work on a new patch, there are several things to learn about the project:
- How to compile the code
- How to run the tests
- How to build the documentation
- How to debug / profile / benchmark the code
- How to format the code
- How to submit a patch
You can usually find this information, or most of it, in the project’s GitHub README or somewhere in the documentation. For PostgreSQL, look at the installation docs.
PostgreSQL is written in C, uses the GNU Autotools build system, and relies on Perl scripting language for testing and SGML for documentation. It uses Git as the version control system, and the repository itself is self-hosted (although there is a mirror on GitHub).
PostgreSQL can be compiled and tested like this:
git clone http://git.postgresql.org/git/postgresql.git cd postgresql ./configure --prefix=/home/user/pginstall --enable-tap-tests --enable-cassert --enable-debug make world make check-world
The details are a little bit more complicated, though. Firstly, you have to install several dependencies, which is done differently depending on the operating system.
For instance, on Ubuntu 20.04 LTS you will need:
# for basic build sudo apt install gcc make flex bison libreadline-dev zlib1g-dev # to build the documentation as well sudo apt install docbook docbook-dsssl docbook-xsl libxml2-utils \ openjade opensp xsltproc
Secondly, there are several common mistakes that you can make, e.g., forgetting to run
make distclean after changing the header files.
There is a set of scripts on GitHub which will help you to avoid these mistakes.
Here is how to use it:
# where to install Postgres (you can add this line to your ~/.bash_profile) export PGINSTALL="/home/user/pginstall" # build and test Postgres ./full-build.sh # install it to $PGINSTALL ./single-install.sh # execute a little more tests on running Postgres make installcheck-world # check the documentation: open ~/pginstall/share/doc/postgresql/html/index.html
Additionally, the PostgreSQL community uses mailing lists as the main communication channel for discussions and submitting and reviewing patches.
To get an idea of the types of messages, subscribe to pgsql-hackers@ (be aware that there are many messages per day on this mailing list). Two other important mailing lists are pgsql-general@ and pgsql-bugs@, and there are many others for assorted topics.
(As an aside: in the State of PostgreSQL survey, in going through the anonymized survey source data, a number of responses mentioned that the mailing lists weren’t the friendliest way to track bugs and may be a barrier to getting involved. For example, “Mailing lists are considered "hard" by people nowadays. Not the most welcoming interface for interacting with the community for quite a large number of people I'd expect.”)
After you’ve gotten familiar with the development process, it’s time to start thinking about ideas for your first patch.
Step #3: Identify your first patch
Your first patch doesn’t have to be anything fancy. Here are several examples that I think make good first patches, both for PostgreSQL and as general places to start:
Find and fix mistakes in comments and documentation. Start with something simple. In my experience, projects are bound to have typos and mistakes in the code comments and in the documentation. Use your favorite text editor with a spell checker to find them. This is a really great place to start and overcome the fear of contributing: that there is no risk of breaking anything. Interestingly, this is exactly how I submitted my first (and so far the only) patch to the Linux kernel. (My first patch to PostgreSQL was rather complicated and thus not very representative.)
Participate in code review, testing, and discussions. Many software developers like to write the code, but few like to review and test it. One of the most valuable contributions one can make to PostgreSQL is being a reviewer. As a reviewer, your primary task is to check that the patch compiles, passes the tests, implements the claimed functionality, and includes documentation. And, of course, it’s worth checking that it doesn’t have any obvious bugs.
Find and fix a bug. Check the bugtracker, or, in the case of PostgreSQL, check the archive of pgsql-bugs@ mailing list. Try to reproduce the bug. If it doesn’t reproduce, it might already be fixed by another patch, or maybe the steps to reproduce it aren’t very clear. In any case, reply to the mailing list to let the community know what you find. If you managed to reproduce the bug, you are lucky; from there, write a corresponding test and change the code so that it passes the test.
Find a bottleneck and optimize the code. After using a piece of software for a while, you discover cases when its performance is far from ideal. Use suitable tools (e.g., perf and eBPF) to find the bottleneck and then eliminate it. Before submitting the patch, make sure it doesn’t cause performance degradation in some other scenarios.
Write tests. Use a suitable tool to test code coverage. For PostgreSQL (C or C++), that tool would be
lcov. With a code coverage report on hand, write a test that increases the code coverage.
Improve documentation. Ideally, documentation is structured in a way that allows users to download it as a PDF and read it like a book. PostgreSQL documentation is quite good in terms of covering many - many! - topics, but could benefit from more experienced/dedicated technical writers (e.g., people who could add more sample scenarios and illustrations to help new users understand concepts). With PostgreSQL, there are 71 chapters and 2300+ pages in total, and the pages mostly describe configuration parameters and query syntax vs. examples of how to solve concrete tasks. The FreeBSD Handbook comes to mind as a good “read it like a book” example.
Refactor the code. Refactoring has a clear goal: you rewrite the code so that it does the same thing but is more readable. It’s worth noting that sometimes the PostgreSQL community can be a little skeptical about the value of such patches. The reason is that the community supports the last 5 major releases of PostgreSQL, so refactorings can make backporting of bugfixes more complicated.
I recommend accumulating small wins - by submitting several “first patch”-like contributions - before you move on to more ambitious patches. This will help you get familiar with the contribution process, tools, and community – not to mention increasing your confidence.
Once you’ve settled on an idea for your first patch, you’re all set to go ahead and start contributing! (I recommend reading the following section about common mistakes to avoid first 😊).
Step #4: Prepare to contribute: how to avoid common mistakes
There are a few common “mistakes” people make when joining an open-source project, so I’ve compiled the following pieces of advice to help you to avoid some of these mistakes.
- Start with something simple
- The smaller your patch, the more are chances that it will be accepted
- Listen carefully to the feedback on your patch
- Always be polite (including: never be sarcastic, trolling, etc...)
- Not all patches get accepted, don’t take it personally
- Contributors are people too; most people work on open source in their spare time
- Grammarly is a great help for those of us for whom English is a second language (and native speakers too)
Thus far, I’ve focused my discussion on contributing patches with new code or bugfixes. But, submitting patches is not the be-all and end-all of contributing to open-source projects.
Next, we’ll look at how to contribute to open-source projects and the surrounding community without writing code.
Step #5: Consider contributing beyond code
There are many ways to contribute to a project besides writing the code and documentation – and these non-code contributions are invaluable.
Here are several of my favorite ideas, although the list does not claim to be complete:
Help newcomers. There are always people who recently started to use a given open-source project (for reference, in the State of PostgreSQL survey, almost 50% of respondents said they were new-ish to the project, with 0-5 years experience.) Usually, there is a mailing list and/or Slack where they can ask questions. Join the corresponding channel and help newcomers.
Participate in conferences. Make a presentation on something you used, learned, or have been working on lately. Share the knowledge. For PostgreSQL, there are several popular conferences, like PGconf.asia, Postgres London, and PGconf.us, as well as many local meetups.
Create a blog / podcast / YouTube channel. This article, for instance, can be considered as a small contribution to open-source. Make sure your blog, podcast, etc., is added to the prominent community news aggregator(s), so people can learn about it. For PostgreSQL, this is PostgreSQL Planet.
Write a book. Writing a book is an ambitious and very time-consuming goal, but there are many ways to do so. For example, Manning is a publisher well known for helping new technical writers to publish their first book; or simply make a PDF in Google Docs and distribute it for free.
Participate in Google Summer of Code or Google Season of Docs. Google Summer of Code (GSoC) is a program focused on bringing student software developers into open-source development. Participate as a student or as a mentor. If you are a technical writer, consider participating in Google Season of Docs (GSoD). See GSoC and GSoD pages on PostgreSQL Wiki for more information.
Donate hardware for CI system. CI stands for continuous integration — and in the PostgreSQL world, it’s called Buildfarm. The community is interested in adding unusual platforms or combinations of architecture, operating system, and compiler to the Buildfarm. As an example, currently, there is no server with RISC-V architecture. RISC-V is an open instruction set architecture (ISA) that gets support from many leading hardware manufacturers, especially after the discovery of Meltdown and Spectre vulnerabilities. See Application to join PostgreSQL Buildfarm for more details.
Resources to learn more
The following additional materials are recommended for self-study in the context of contributing to PostgreSQL:
- C in a Nutshell, 2nd Edition by Peter Prinz, Tony Crawford
- Autotools Mythbuster by Diego Elio Pettenò, David J. Cozatt
- Learning Perl and Intermediate Perl by Randal L. Schwartz, et al.
- Refactoring by Martin Fowler, et al.
- Systems Performance, 2nd Edition by Brendan Gregg
- BPF Performance Tools by Brendan Gregg
- Database System Implementation by Hector Garcia-Molina, et al.
- Growing up new PostgreSQL developers, by Anastasia Lubennikova and myself. See the bonus slides with recommended white papers.
This post is merely a collection of advice, best practices, and various other things I’ve observed over the years to help would-be contributors make the jump from “never contributed” to “contributed once or twice” (and, ultimately, hopefully some make it to “contributed many times”).
There are many more technical and community experience topics that I didn't cover here. If you have something you’d like to read more about (e.g., debugging, profiling, and benchmarking PostgreSQL, or maybe about writing extensions), reach out to let me and the team know: [email protected].
We’re also looking for ways to help the community, share knowledge, and contribute to things community members are already working on.
Lastly, I’d be remiss if I didn’t mention that we’re hiring across multiple teams 🙌. TimescaleDB is the leading open-source relational database for time-series data. It’s packaged as a PostgreSQL extension (an extension like
CREATE EXTENSION, not a fork, nor a set of patches).
If you know C and SQL, have experience with PostgreSQL, and want to be a full-time database developer, I encourage you to consider joining Timescale. Timescale is a remote-first company, with people located on all continents (except Antarctica – but if that’s you, we’re happy to outfit your home office with a space heater).
If you’d like to discuss technical topics with me, Timescale engineers, and other developers and community members, you can find us in the TimescaleDB Slack (8K+ members).
Happy contributing 🐘🚀