Analyzing Ethereum, Bitcoin, and 1200+ other Cryptocurrencies using PostgreSQL
Cryptocurrencies are fueling a modern day gold rush. Can data help us better understand this evolving market?
[Update: Thank you everyone for making this #1 on Hacker News! Read the HN discussion here.]
Original post by TimescaleDB intern Sarah Pan
Lately it seems like money has been growing on trees.
We live in the age of digital currencies, with cryptocurrencies birthed within the decade. Yet already, there are more than a thousand cryptocurrencies in the market and an initial coin offering (ICO) almost daily.
As we embrace this new, proliferous market, it’s important that we try to understand what’s going on. There are many risks to observe at both the micro-level (e.g., personal investments) and macro-level (e.g., prevention of market crashes and major loss of capital). That’s where we come in.
We’re data people. Specifically, we’re the developers of TimescaleDB, a new open source time-series database built up from PostgreSQL. And we thought it would be insightful (and fun) to analyze the cryptocurrency market using PostgreSQL and TimescaleDB (plus R for data visualization).
For this analysis*, we looked at historical OHLCV price data on over 1200 cryptocurrencies (as of 6/26/2017; courtesy of CryptoCompare). While our current dataset represents only a daily record of rates, TimescaleDB scales easily to much finer-grained historical data. With the constant influx of new coins and exchanges, TimescaleDB can provide a reliable foundation for time-series data in the cryptocurrency market.
Here’s what you should take away from this post:
- Several high-level insights into the cryptocurrency market
- A better understanding of how TimescaleDB + PostgreSQL make time-series data analysis easier
- Instructions on how to load this dataset yourself and draw your own insights (and perhaps find your own arbitrage opportunities!)
*Please note that nothing in this analysis should be construed as financial advice. If you’d like to draw your own conclusions, here are instructions to install TimescaleDB and download this cryptocurrency dataset.
So if you had invested $100 in Bitcoin 7 years ago, it would be worth…
Let’s start with some good old-fashioned FOMO. If you know anything about cryptocurrencies, you’ve probably heard of Bitcoin, the “granddaddy” of all cryptocurrencies. Turns out that if you had invested $100 in Bitcoin in July 2010, it would be worth over $5,000,000 today.
Bitcoin has had a pretty nice run since then (although taking a small dip recently):
-- BTC USD prices by two week intervals SELECT time_bucket('14 days', time) as period, last(closing_price, time) AS last_closing_price FROM btc_prices WHERE currency_code = 'USD' GROUP BY period ORDER BY period;
Using PostgreSQL, we’ve queried BTC’s prices at 2-week intervals, analyzing the rates for USD exchanges. (Note that “time_bucket” and “last” in this query are special TimescaleDB time-series data analysis functions not in PostgreSQL.)
But hopefully you didn’t buy in February 2014…
It hasn’t exactly been a smooth ride for BTC. Let’s hone in on the day-by-day volatility of BTC. Here we calculate daily returns using the power of PostgreSQL window functions:
-- Daily BTC returns by day SELECT time, closing_price / lead(closing_price) over prices AS daily_factor FROM ( SELECT time, closing_price FROM btc_prices WHERE currency_code = 'USD' GROUP BY 1,2 ) sub window prices AS (ORDER BY time DESC);
As a relatively new market, bitcoin prices are notably subject to volatile fluctuations. While a steady increase in price marks the success of BTC, the highest spike occurred in early 2014. If we zoom in on 2014, we notice that the jump occurred specifically between February and March of 2014. For those who invested at the peak of this market, the price soon stabilized, forcing investors who bought then to hold for a long time to see returns.
Goodbye China, hello Japan
The cryptocurrency market is global. When looking at trade volumes by currency, we noticed something interesting:
-- BTC trading volumes by currency SELECT time_bucket('14 days', time) as period, currency_code, sum(volume_btc) FROM btc_prices GROUP BY currency_code, period ORDER BY period;
The year 2014 saw a minor jump for Bitcoin rates in China, presumably caused by the devaluation of the yuan and weakening domestic stock market. This was followed by a subsequent boom in 2016 and early 2017, as Chinese volume dominated Bitcoin trades.
Within a few months, the volumes dropped dramatically.
Why? This is where we had to step out of the numbers and do some old-fashioned research. (And what we found shows how you can’t just rely on quantitative data when trying to understand this market.)
In early 2017 the People’s Bank of China began reinforcing regulations and legal liabilities for risky cryptocurrency exchanges. By February, two of the largest Chinese exchanges (OKCoin and Huobi.com) had suspended withdrawals and by mid-2017, Chinese transactions had dried up. From there, Japan became the leader in bitcoin transactions by volume, even going so far as to recognizing bitcoin as legal currency in April 2017.
Now, if you had invested $100 in ETH in January 2017…
Don’t worry if you didn’t hop onto the Bitcoin train in 2010. As volatile as Bitcoin has been, some would argue that Ethereum has been a crazier ride (and the recent “correction” shows it). Let’s look at the Ethereum price over time in Bitcoin (as it’s normally quoted):
-- ETH prices in BTC by two week intervals SELECT time_bucket('14 days', c.time) as period, last(c.closing_price, c.time) AS last_closing_price_in_btc FROM crypto_prices c WHERE c.currency_code = 'ETH' GROUP BY period ORDER BY period;
But as we know, Bitcoin itself has been fairly volatile, which renders the above chart less useful. So let’s look at ETH prices in fiat currencies, using each day’s BTC to fiat exchange rates. (Taking advantage of Postgres JOINs and some fancy filters):
-- ETH prices in BTC, USD, EUR, and CNY by two week intervals SELECT time_bucket('14 days', c.time) as period, last(c.closing_price, c.time) AS last_closing_price_in_btc, last(c.closing_price, c.time) * last(b.closing_price, c.time) filter (WHERE b.currency_code = 'USD') AS last_closing_price_in_usd, last(c.closing_price, c.time) * last(b.closing_price, c.time) filter (WHERE b.currency_code = 'EUR') AS last_closing_price_in_eur, last(c.closing_price, c.time) * last(b.closing_price, c.time) filter (WHERE b.currency_code = 'CNY') AS last_closing_price_in_cny FROM crypto_prices c JOIN btc_prices b ON time_bucket('1 day', c.time) = time_bucket('1 day', b.time) WHERE c.currency_code = 'ETH' GROUP BY period ORDER BY period;
In its first year, ETH surpassed any yearly BTC growth rate in all of BTC history — a hefty 530% surge in average closing price from the previous year marked a good start. Cumulatively, the growth has since fallen to 200% going from 2016 to 2017, though still an impressively high rate for any other asset. And within the last half year, ETH prices have increased by 3000%. So, if you had invested $100 in ETH in January 2017 (less than 7 months ago), it would be worth over $3,000 today.
Projecting the price of ETH in these stable currencies (USD, EUR, CNY), it appears that the trend lines remain consistent between the three fiat monies. A clear progression is apparent in the steep uprise within the last six months and trends reflect a massive growth for the coin when quoted in all currencies, except BTC. Relative to the fiat charts, the ETH/BTC chart is quite unstable due to the fluctuating price of BTC over the years. As a result, the representation of ETH by BTC price inflates the variability of ETH. Clearly BTC is still too immature to be considered a base currency.
What about the 1200 other cryptocurrencies?
With that brief examination of BTC and ETH trends, hopefully you have more context into the hectic world of cryptocurrencies. So what do we do with the other 1200 cryptocurrencies?
Well first, let’s use our dataset to trace the lineage of these cryptocurrencies:
(Disclaimer: our dataset represents when we first have recorded data, which may not necessarily correspond to the ICO.)
-- Currencies sorted by first time we have data for them SELECT ci.currency_code, min(c.time) FROM currency_info ci JOIN crypto_prices c ON ci.currency_code = c.currency_code AND c.closing_price > 0 GROUP BY ci.currency_code ORDER BY min(c.time) DESC;
currency_code | min ---------------+------------------------ CIR | 2017-06-26 20:00:00+00 MDC | 2017-06-26 20:00:00+00 WBTC | 2017-06-26 20:00:00+00 NET | 2017-06-26 20:00:00+00 NAS2 | 2017-06-26 20:00:00+00 TPAY | 2017-06-26 20:00:00+00 MRSA | 2017-06-26 20:00:00+00 XCI | 2017-06-26 20:00:00+00 PAY | 2017-06-19 20:00:00+00 SNM | 2017-06-18 20:00:00+00 LGD | 2017-06-17 20:00:00+00 SNT | 2017-06-16 20:00:00+00 IOT | 2017-06-12 20:00:00+00 QRL | 2017-06-09 20:00:00+00 MGO | 2017-06-09 20:00:00+00 CFI | 2017-06-08 20:00:00+00 VERI | 2017-06-07 20:00:00+00 EQT | 2017-06-07 20:00:00+00 ZEN | 2017-06-05 20:00:00+00 BAT | 2017-05-31 20:00:00+00
It’s an evolving market. And one with no clear ceiling, as we can see when we query the number of new cryptocurrencies by day. Above are just the most recent 20 records, showing how many new currencies amass each week.
Here’s a look at that same day, but counting the number of new currencies with data each day:
-- Number of new currencies with data sorted by day SELECT day, COUNT(code) FROM ( SELECT min(c.time) AS day, ci.currency_code AS code FROM currency_info ci JOIN crypto_prices c ON ci.currency_code = c.currency_code AND c.closing_price > 0 GROUP BY ci.currency_code ORDER BY min(c.time) )a GROUP BY day ORDER BY day DESC;
day | count ------------------------+------- 2017-06-26 20:00:00+00 | 8 2017-06-19 20:00:00+00 | 1 2017-06-18 20:00:00+00 | 1 2017-06-17 20:00:00+00 | 1 2017-06-16 20:00:00+00 | 1 2017-06-12 20:00:00+00 | 1 2017-06-09 20:00:00+00 | 2 2017-06-08 20:00:00+00 | 1 2017-06-07 20:00:00+00 | 2 2017-06-05 20:00:00+00 | 1 2017-05-31 20:00:00+00 | 5 2017-05-28 20:00:00+00 | 29 2017-05-27 20:00:00+00 | 13 2017-05-26 20:00:00+00 | 32 2017-05-25 20:00:00+00 | 303 2017-05-16 20:00:00+00 | 1 2017-05-15 20:00:00+00 | 7
When we query each of the currencies by their first set of data (to track its “age”), it’s clear that the market is not simply for investors, but also for creators of these digital assets. Most recently, a flood of new coins entered our dataset during May 25–28, amounting to over 300 new cryptocurrency records in less than a week. (Of course, these dates reflect when our data source first had price data for the currencies, which may not correspond to the ICO.)
Who’s at the head of the cryptocurrency long tail?
There are so many cryptocurrencies that it becomes hard to separate the good ones from the noise. How do we identify which ones worth focusing on? Here’s one metric: total trade volume over the past week.
-- 1200+ crypto currencies by total transaction volume (in btc) over the last month SELECT 'BTC' as currency_code, sum(b.volume_currency) as total_volume_in_usd FROM btc_prices b WHERE b.currency_code = 'USD' AND now() - date(b.time) < INTERVAL '8 day' GROUP BY b.currency_code UNION SELECT c.currency_code as currency_code, sum(c.volume_btc) * avg(b.closing_price) as total_volume_in_usd FROM crypto_prices c JOIN btc_prices b ON date(c.time) = date(b.time) WHERE c.volume_btc > 0 AND b.currency_code = 'USD' AND now() - date(b.time) < INTERVAL '8 day' AND now() - date(c.time) < INTERVAL '8 day' GROUP BY c.currency_code ORDER BY total_volume_in_usd DESC;
currency_code | total_volume_in_usd ---------------+--------------------- BTC | 2040879023.54 ETH | 1617388472.94011 LTC | 287613541.293571 XRP | 269417667.514443 ETC | 165712729.612886 ANS | 126377042.5269 SC | 111623857.796786 DASH | 86875922.3588143 ZEC | 78836728.2129428 BTS | 69459051.5958428
Quick note on what this query is doing: The BTC and crypto-currency data lives in separate tables. So we have to UNION the two queries. Also, as we established earlier, we want to quote volumes in a fiat currency (e.g., USD) and not BTC. So the second half of the query joins with the BTC table to convert BTC to USD.
Top cryptocurrencies (measuring by transaction volume) are (not surprisingly) Bitcoin and Ethereum. But after that, seems like Litecoin (LTC), Ripple (XRP), and Ethereum Classic (ETC) are not far off. As a five-year old coin, Litecoin is almost identical to Bitcoin and is often considered a key player in the market. Meanwhile, Ripple targets a more specific audience as a banking coin in the global commerce arena, also showing promise as a progressively dominant coin. Interestingly in the top 5 for our query, both ETH and ETC make appearances, suggesting a major shift towards Ethereum in the market.
What are the most profitable cryptocurrencies?
Another way to sift through the long tail of cryptocurrencies is by profitability (e.g., as measured by total daily return). Our data contains a set of prices for over 1200 cryptocurrencies. If we hone in on the highest increase in rate by day, we can see which cryptocurrencies lead the daily market.
-- Top crypto by daily return, by day SELECT time, last(currency_code,daily_factor), max(daily_factor) FROM ( SELECT currency_code, time, closing_price, lead(closing_price) over (partition BY currency_code ORDER BY time DESC) AS prev_day_closing_price, closing_price / lead(closing_price) over (partition BY currency_code ORDER BY time DESC) AS daily_factor FROM crypto_prices) q GROUP BY time ORDER BY time DESC;
time | last | daily_return ------------------------+--------+------------------ 2017-06-26 20:00:00+00 | CIN | 40.1428571428571 2017-06-25 20:00:00+00 | KC | 38 2017-06-24 20:00:00+00 | VOYA | 14.2747252747253 2017-06-23 20:00:00+00 | PAY | 3.18506315211422 2017-06-22 20:00:00+00 | YOVI | 119.607843137255 2017-06-21 20:00:00+00 | ION | 7.97665369649805 2017-06-20 20:00:00+00 | TES | 5.25157232704403 2017-06-19 20:00:00+00 | KNC | 150000 2017-06-18 20:00:00+00 | ZNY | 22.5217391304348 2017-06-17 20:00:00+00 | YOVI | 22.0590746115759 2017-06-16 20:00:00+00 | LTD | 9.50207468879668 2017-06-15 20:00:00+00 | AMIS | 168758.782201405 2017-06-14 20:00:00+00 | JANE | 6 2017-06-13 20:00:00+00 | YOVI | 690.636254501801 2017-06-12 20:00:00+00 | U | 5.21452145214522 2017-06-11 20:00:00+00 | JANE | 6 2017-06-10 20:00:00+00 | WGO | 3.58744394618834 2017-06-09 20:00:00+00 | BNT | 5000 2017-06-08 20:00:00+00 | XNC | 52.7704485488127 2017-06-07 20:00:00+00 | CBD | 14.3243243243243 2017-06-06 20:00:00+00 | CC | 72 2017-06-05 20:00:00+00 | BLAZR | 7.38461538461538 2017-06-04 20:00:00+00 | GREXIT | 13.0833333333333 2017-06-03 20:00:00+00 | EPY | 4.29880478087649 2017-06-02 20:00:00+00 | YOVI | 1257.67790262172 2017-06-01 20:00:00+00 | FCN | 8.57142857142857 2017-05-31 20:00:00+00 | EPY | 348.611111111111 2017-05-30 20:00:00+00 | BST | 14.1441860465116 2017-05-29 20:00:00+00 | FCN | 45.3086419753086 2017-05-28 20:00:00+00 | NOO | 56536.231884058
Here we identify the cryptocurrency with the highest total daily return, by day, going backwards in time. (To do that, we again use a window function to calculate daily returns, and then use the TimescaleDB last function to find the cryptocurrency with the highest return for that day.)
Our output for the last three months shows a numeric lead by AMIS (168x on 6/15), which appears as the cryptocurrency with the top increase for 15 distinct days. However, if we look more closely at AMIS’ rates, we realize that this high increase is also due to high fluctuation rates: AMIS tends to drop back to a closing price of zero after each increase.
Likewise, the cryptocurrency YOVI appears 3 times in our list of daily leads but has a similarly unreliable trend like AMIS:
While both trends are rather unstable, they show more promise than ETH’s down-sloping first year (2015):
(Repeat Disclaimer: TimescaleDB does not endorse any of these cryptocurrencies and is not liable for investments that you make / losses you may incur.)
So money grows on… Merkle Trees?
Here we drew some conclusions from public cryptocurrency datasets, highlighting the power of both PostgreSQL and TimescaleDB. Yet we should remember that the cryptocurrency market will inevitably be different next month, week, even day.
But if you’d like to explore this dataset and draw your own conclusions, here are the instructions for downloading it (and installing TimescaleDB).
If you’d like to learn more about TimescaleDB, and how it makes PostgreSQL scalable for time-series data, we’d recommend this technical post.
And if you’d like to learn more about TimescaleDB, please check out our GitHub (stars appreciated), and let us know how we can help.