How NLP Cloud Monitors Their Language AI API
This is an installment of our “Community Member Spotlight” series, where we invite our customers to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.
In this edition, Julien Salinas, full-stack developer, founder, and chief technology officer (CTO) at NLP Cloud, talks about how his team is building an advanced API to perform natural language processing (NLP) tasks while taking care of the complexity of AI infrastructures for developers.
About the Company
NLP Cloud is an advanced API for text understanding and generation in production. The most recent AI models are easy to use on NLP Cloud (GPT-NeoX 20B, GPT-J, Bart Large, among others). Thanks to the API, you can perform all kinds of natural processing language processing tasks: text summarization, paraphrasing, automatic blog post generation, text classification, intent detection, entity extraction, chatbots, question answering, and much more!
Today, more than 20,000 developers and data scientists use NLP Cloud successfully in production! They love NLP Cloud because they don’t want to deal with MLOps (DevOps for machine learning) by themselves. NLP Cloud takes care of the complex infrastructure challenges related to AI (GPU reliability, redundancy, high availability, scaling, etc.).
About the Team
My name is Julien Salinas, and I’m a full-stack developer, founder, and CTO of NLP Cloud. Our company has a team of six high-level engineers skilled in NLP, DevOps, and low-level optimization for machine learning.
The team works hard to provision state-of-the-art NLP models like GPT-NeoX (equivalent to OpenAI’s GPT-3) and make sure that these models run reliably in production and at an affordable cost.
About the Project
We realized we needed a time-series database when our users asked us for a pay-as-you-go plan for GPT-J, one of our open-source NLP models. They wanted to be charged based on the number of words they’re generating, the same way OpenAI does with their GPT-3 API. Our users also wanted to monitor their usage through their NLP Cloud dashboard.
So, we started implementing TimescaleDB to log the following:
- The number of API calls per user and API endpoint
- The number of words sent and generated per user by GPT-J and GPT-NeoX
- The number of characters sent and received by our multilingual add-on
We had two main requirements:
- Writing the data had to be very fast in order not to slow down the API
- Querying the data had to be easy for our admins to quickly inspect the data when needed and easily show the data to our customers on their dashboard
Choosing (and Using!) TimescaleDB
I found out about Timescale by looking for InfluxDB alternatives. I found the Telegraf, Influx, and Grafana (TIG) stack quite complex, so I was looking for something simpler.
“TimescaleDB is a cornerstone of our pay-as-you-go plans”
The top factors in my decision for Timescale were the following:
- Easy data downsampling thanks to continuous aggregates
- PostgreSQL ecosystem: no need to learn something new, and we were all already skilled in SQL and PostgreSQL, so it saved us a lot of time and energy
We use TimescaleDB behind our natural language processing API to track API usage. Based on that, we can do analytics on our API and charge customers depending on their consumption. TimescaleDB is a cornerstone of our pay-as-you-go plans. Most of our users select such plans.
If you want to see how we do it, I detailed how we use TimescaleDB to track our API analytics in a previous blog post.
“The greatest TimescaleDB feature for us is the ability to automatically downsample data thanks to continuous aggregates”
Before using TimescaleDB, we did very naive analytics by simply logging every API call into our main PostgreSQL database. Of course, it had tons of drawbacks. We had always known it would be a temporary solution as long as the volume of API calls remained reasonably low (right after launching the API publicly), and we quickly switched to TimescaleDB as soon as possible.
We also evaluated a TIG solution (InfluxDB) but found that the complexity was not worth it. If TimescaleDB did not exist, we would maybe stick to a pure log-based solution backed by Elasticsearch.
Current Deployment and Future Plans
We use TimescaleDB as a Docker container automatically deployed by our container orchestrator. Two kinds of applications insert data into TimescaleDB: Go and Python microservices. To visualize the data, we’re using Grafana.
The greatest TimescaleDB feature for us is the ability to automatically downsample data thanks to continuous aggregates. We’re writing a lot of data within TimescaleDB, so we can’t afford to keep everything forever, but some high-level data should be kept forever. Before that, we had to develop our own auto-cleaning routines on PostgreSQL: it was highly inefficient, and some of our read queries were lagging. It’s not the case anymore.
The NLP Cloud API is evolving very fast. We are currently working hard on multi-account capabilities: soon, our customers will be able to invite other persons from their team and manage multiple API tokens.
In the future, we also plan to integrate several new AI models and optimize the speed of our Transformer-based models.
Advice and Resources
We recommend Timescale to any development team looking for a time-series solution that is both robust and easy to deal with. Understandably, most developers don’t want to spend too much time implementing an analytics solution. We found that TimescaleDB was simple to install and manage for API analytics, and it scales very well.
The TimescaleDB docs are a very good resource. We didn’t use anything else.
My advice for programmers trying to implement a scalable database strategy? Don’t mix your business database or online transaction processing (OLTP) with your online analytical processing or analytics database (OLAP).
It’s quite hard to efficiently use the same database for both day-to-day business (user registration and login, for example) and data analytics. The first one (OLTP) should be very responsive if you don’t want your user-facing application to lag, so you want to avoid heavy tasks related to data analytics (OLAP), as they are likely to put too much strain on your application.
Ideally, you want to handle data analytics in a second database that is optimized for writes (like TimescaleDB) and is perfectly decoupled from your OLTP database. The trick then is to find a way to properly move some data from your OLTP database to your OLAP database. You can do this through asynchronous extract, transform, and load (ELT) batch jobs, for example.
We’d like to thank Julien and his team at NLP Cloud for sharing their story and writing a blog post on how the NLP Cloud Team uses TimescaleDB to track their API analytics.
We’re always keen to feature new community projects and stories on our blog. If you have a story or project you’d like to share, reach out on Slack (@Ana Tavares), and we’ll go from there.