A success story:SAKURA internet

SAKURA internet is a leading Internet infrastructure service provider for businesses and individuals in Japan. They take pride in offering high-quality, low-cost IT platforms and providing outstanding operational performance for their customers. SAKURA internet relies on TimescaleDB as a stable and reliable time-series database that stores all their traffic volume trend data.

UserTamihiro Lee
TitleNetwork Engineer
CompanySAKURA internet
IndustryInformation Technology
Use casesNetwork monitoring
“When I first got my hands on TimescaleDB, it's sheer speed and the fact that it was a PostgreSQL extension was enough to impress me big time...

When I first got my hands on TimescaleDB, it’s sheer speed and the fact that it was a PostgreSQL extension was enough to impress me big time. For those looking to use TimescaleDB, you don’t have to be afraid of performance degradation normally associated with PostgreSQL clusters. TimescaleDB scales with you and your project.”

- Tamihiro Lee, Network Engineer, SAKURA internet

Why SAKURA internet needs TimescaleDB

Founded in 1996, SAKURA internet operates three advanced earthquake-resistant data centers across Japan, providing housing and hosting services for some of the largest customers in Japan. They have one of the highest capacity backbones in Japan, making them an ideal environment for businesses looking for mission-critical hosting services.

To better understand network traffic and enable them to make informed business decisions for their customers, SAKURA internet built an application designed to collect and aggregate routing information. This means that all the routers deployed in their network export traffic flow samples to a collector where the application resides.

The SAKURA internet team’s goal was to achieve a clear view of traffic trends in their network backbone for 1) capacity planning 2) peering coordination and 3) real-time analysis. In order to achieve these requirements, SAKURA internet became an early adopter of TimescaleDB due to its full SQL support, ability to return quick responses to complex queries, and architecture built to ingest large amounts of time-series data at scale.

Achieving high insert rates and stable performance

Prior to using TimescaleDB, the SAKURA internet team used RRDtool to store traffic volume trend data. However, RRDtool presented several operational challenges and required a lot of manual servicing. The team also evaluated InfluxDB, but found that it was unable to satisfy their ingest requirements. They started to search for an alternative time-series database and came across TimescaleDB.

Since they were already using PostgreSQL, TimescaleDB seemlessly fit into their architecture. However, the team knew that when their insert rates grew substantially in PostgreSQL, the performance of their database would begin to degrade. Fortunately, TimescaleDB’s architecture is built to sustain high insert rates while maintaining consistent performance. These two characteristics were enough to impress the team and make SAKURA internet an early adopter back in March of 2018 (less than a year after TimescaleDB officially launched).

Instead of inserting raw traffic samples directly to their database, their in-house application collects one minute of aggregated samples with many metrics, and inserts the aggregated data to TimescaleDB. Currently their insert rates are about 150-350K rows per minute which are completed in 2-5 seconds.

Additionally, since TimescaleDB has full SQL support, SAKURA internet leverages subselects, outer JOINs, and conditional expressions when querying data.

Building a high-performance infrastructure monitoring stack

The SAKURA internet team has standardized on Grafana as their visualization tool. Since TimescaleDB has a native integration with Grafana, the SAKURA internet team was able to leverage the combined power of both tools to visualize traffic volume. For example, they might want to see outgoing traffic from a specific router’s port for a defined period. They run the query in TimescaleDB which is connected to their Grafana dashboard and displays a breakdown.

In addition to Grafana and TimescaleDB, SAKURA internet’s stack also consists of the in-house application they built themselves, sFlow (a tool for monitoring high speed switched networks), and GoBGP (an open source BGP daemon designed to respond super quickly to requests for network routing information with gRPC based API).

At a high-level, this is how their application stack works: SAKURA internet has dozens of monitored devices in their IP network backbone, each of which exports sample traffic flow data as sFlow datagrams to their in-house application which acts as an sFlow collector. The collector receives sFlow datagrams at about 4K to 10K FPS. Their application attaches a missing timestamp to each sample and collects additional information called

extended_gateway_data
. SAKURA internet’s app acts as a gRPC client which queries to GoBGP for the missing data about 2K to 4K times per second.

Their application aggregates received sFlow data every minute using attributes such as source, destination network prefix, and device input and output ports, as aggregate keys for estimated traffic volume. These aggregations becomes the source of data that is stored in TimescaleDB.

TimescaleDB is their main data store and the the team connects Grafana to TimescaleDB where they can build dynamic dashboards to visualize all the metrics they are collecting.

At a high-level, this is how their application stack works: SAKURA internet has dozens of monitored devices in their IP network backbone, each of which exports sample traffic flow data as sFlow datagrams to their in-house application which acts as an sFlow collector. The collector receives sFlow datagrams at about 4K to 10K FPS. Their application attaches a missing timestamp to each sample and collects additional information called

extended_gateway_data
. SAKURA internet’s app acts as a gRPC client which queries to GoBGP for the missing data about 2K to 4K times per second.

Their application aggregates received sFlow data every minute using attributes such as source, destination network prefix, and device input and output ports, as aggregate keys for estimated traffic volume. These aggregations becomes the source of data that is stored in TimescaleDB.

TimescaleDB is their main data store and the the team connects Grafana to TimescaleDB where they can build dynamic dashboards to visualize all the metrics they are collecting.

Leveraging native compression for storage savings

As demonstrated above, the team at SAKURA internet is collecting and storing a lot of data -- in fact, they are currently storing about 6.7 TB of data within TimescaleDB.

In order to accommodate these large data workloads, they previously leveraged a PostgreSQL cluster on a ZFS volume for storage. A member of the team, Tamihiro, was an early beta tester of TimescaleDB’s native compression capabilities and saw some pretty substantial results.

94%
SAKURA internet’s storage savings with TimescaleDB compression

When compressing a single hypertable in TimescaleDB, he received the following compression ratios:

Uncompressed hypertable: 1396 GB Compressed hypertable: 77.0 GB Storage savings: 94%

Now that this feature is generally available, the team at SAKURA internet can leverage native compression to significantly save on storage costs while continuing to collect terabytes of time-series data.

“Compression ratio is jaw-droppingly high :)”
- Tamihiro Lee, Network Engineer, SAKURA internet

Traffic flows smoothly with TimescaleDB

TimescaleDB enables SAKURA internet’s team to feel confident in the stability of their architecture so they can continue to provide leading internet services to the country of Japan. Additionally, their business can continue to expand based on an infrastructure that is able to grow with them.

Identify with SAKURA internet's use case?
Contact our team to learn more

I'd like to know more