A long time ago in a galaxy far, far away I decided to end my academic career because working in industry afforded me to go all on Bitcoin. However, after running Bitcoin R&D at Glassnode for more than a year, I realized simply working with Bitcoin wasn’t good enough: building proprietary software and doing research behind closed doors conflicted with Bitcoin’s cyberpunk roots and open source ethos, qualities that originally drew me toward Bitcoin; instead, I wanted to contribute! So I tried my luck and applied for a grant from Spiral… and, to my surprise, got accepted!
In the following, I’ll be providing some details about what I’m planing to work on during my one-year grant period; but before that, I’d like to give my thanks to everyone at Spiral involved in making this happen (Conor especially) and giving me the opportunity to contribute to the Bitcoin ecosystem!
Overview
During the grant period, I’ll be working with James O’Beirne on developing software and infrastructure to monitor the operational health of the Bitcoin network.
Some of the high-level insights we hope to generate revolve around attack and anomaly detection, transaction and block propagation times, mempool convergence, network composistion in terms of node software and version, as well as performance differences across different node software and versions.
Getting to these insights, however, requires access to a host of off-chain data that is not readily available today, so creating software and infrastructure to collect this data is paramount.
The case for off-chain data
Bitcoin blockchain analysis is ubiquitous today because the data is easily accessible to anyone. Blockchain data give rise to the many websites that provide statistics about the Bitcoin network such as the network’s hashrate, transaction throughput and fees, and many more.
However, for some investigations (including ours!), a blockchain-based view can be too narrow. For one thing, blockchain data lacks precise transaction and block timestamps, rendering it useless for fine-grained analyses of transaction and block propagation in the network. For another, it misses transactions that did not make it into the blockchain, which provide crucial clues about the demand for block space, fee dynamics, replace-by-fee usage, and more. Also missing are invalid blocks and transactions sent in the network, which can be a basis for anomaly detection.
Fortunately, comprehensive off-chain data sets can be extracted from Bitcoin Core nodes. Unfortunately, however, as of now there is no standardized way to collect the data in a robust and automated way. Fixing this issue will be one of our first ojectives. We plan to identify what data needs to be collected and how it can best be collected (API, ZeroMQ, tracing), and then come up with a suitable implementation.
Infrastructure and data set
Once a means to collect off-chain data is available, the next step involves setting up suitable infrastructure for collecting and storing data. This includes:
- Automating the setup and deployment of a fleet of nodes to collect off-chain data
- Developing suitable ETL pipelines to process the collected data
- Selecting a suitable storage that allows easy access to and further processing of the data
From multiple conversations with other researchers we learned that other research can profit from an off-chain data set. To this end, we will investigate how to best make a consolidated version of our data set publicly available in such a way that it can easily be processed (e.g., from a Jupyter notebook or using SQL via Amazon’s Redshift or Google’s BigQuery).
Operational health
Once off-chain data is available, the final step of the project involves the development of high-level indicators and their visualization.
The development of indicators involes devising ways to process the off-chain data to derive an interpretable signal from them (e.g., how do we get from the arrival times of transactions or blocks on different nodes to a scalar quantifying network convergence performance?).
The visualization then involves creating some sort of live view of the indicators on a website (e.g., using Grafana or plotly).