In this article, I attempt to explain the early source code of Bitcoin and make correlations with the source code currently in use. My intent is to help you better understand the early details regarding the history of Bitcoin and the enigmatic figure of Satoshi Nakamoto. As far as newcomers to Bitcoin or to the blockchain space are concerned, this may be a fairly complex article to understand as my target audience are average and more experienced users. Despite this, I have sought to make it as understandable as possible.
Bitcoin is the cryptocurrency par excellence and the first to be developed and released. Created in late 2008, by a group of developers or a developer named Satoshi Nakamoto, Bitcoin now represents the concept of alternative currency which plays an important role in today’s zeitgeist ― so much so that terms like cryptocurrencies and Bitcoin are synonymous.
In the early years of its development, many enthusiasts were overwhelmed by a novel idea: To create a currency that was untethered from a central entity. Almost 13 years later, we can say that Satoshi has succeeded in instilling its core ideals in many Bitcoin enthusiasts and non-Bitcoin enthusiasts alike.
There is a story, however, that is unknown to many that reveals some characteristics of how those behind Satoshi Nakamoto’s anonymous identity operated. This is a glimpse into the story behind Bitcoin’s “alternative” genesis blockchain. Don’t worry, we are not talking about any hacks, this is just an alternative network that was born before January 2009. Note that we will refer to Satoshi as a single person; although, you have to keep in mind that more research points to Satoshi being composed of a group of developers.
Like any good story, let’s start from the beginning, when Satoshi posted on a metzdowd mailing list the link to Bitcoin’s white paper ― a rather “revolutionary” idea. So “revolutionary” that the mailing list administrators urged users not to talk about economic policies, but rather to focus on the technical aspects and impact of the technology.
For background, Cryptography@Metzdowd is a mailing list created to discuss topics and news related to cryptography technology and its political impact. On October 31, 2008 14:10:00 UTC, Satoshi sent the first e-mail to cryptography@metzdowd saying that he has published a white paper describing a new technology called Bitcoin.
As new e-mails replied to and engaged in conversation regarding Bitcoin, many people began discussing with Satoshi about the development of the new currency. In particular, within the public e-mail correspondence, we can cite an important passage from a November 17, 2008 e-mail sent to James A. Donald.
Satoshi responds with the following:
I believe I’ve worked through all those little details over the last year and a half while coding it, and there were a lot of them. The functional details are not covered in the paper, but the sourcecode is coming soon. I sent you the main files. (available by request at the moment, full release soon)
The source code pre-release is an excellent resource for beginners and novices who want to start looking into Bitcoin’s source code. In fact, the pre-release can be considered a prototype (minimum viable product, aka MVP) of what would become Bitcoin.
Early on, the source code was only available upon request and, until that time, Satoshi was not as well known as today. In late 2013, one user (Cryddit, aka Ray Dillinger) who had received the source code, posted it on the BitcoinTalk.org forum. There, Cryddit posted the source code as it was received, effectively removing important information correlated to those files such as metadata.
You can still download a copy from Nakamoto Institute - bitcoin-nov08.tgz. The source code of the very first version of Bitcoin consists of four files:
node.cpp
and node.h
– code for the node (acceptance of blocks)main.cpp
and main.h
– wallet, transactions, blocks (read from disk), and genesis blockOf course, the source code does not contain all the files needed to generate a binary. It can be assumed that Satoshi had not included all the files for fear that someone else might copy his idea, or trivially, because he was still working on it. It is also confirmed that the very first version of the source code is heavily modified and many of the original comments have been removed.
Within the source code there are some missing files that are mentioned, in particular we find:
headers.h
– probably a global file that contained all references to libraries (boost)sha.h
– library file to include the SHA hashing algorithmOne of the first oddities, when comparing the source code with a closer (Bitcoin 0.1) but more complete one, was the addition of the sha.h
header file. Satoshi seems to have forgotten to remove it, since sha.h
is not used anywhere. Also, the sha.h
file was not written by Satoshi, since it is public domain code (more specifically cryptopp).
It therefore comes to mind that there may be another (and possibly more valid) hypothesis: By releasing the very first version of the source code, Satoshi wanted to get feedback from experts on the most important parts of the project ― leaving out all the other superfluous parts. In particular, Satoshi was trying to get feedback on the networking, transaction management, and blockchain elements.
Another oddity that can be found within node.cpp
(function ThreadBitcoinMiner
) consists of the mentioning of a miner
(function BitcoinMiner()
) which, however, is not actually included within the source code. The file script.cpp
has not been included, nor have all the other files for public/private key generations.
Now that we have an overview of the source code, we can delve into what seems like an untrue story. Let’s start with the introduction of the blockchain. Satoshi, in the white paper describing Bitcoin, hypothesizes a chain into which blocks containing transactions are placed. Within the very first version of the source code, the chain was called a “timechain.”
Blocks are bound together by a “chain” ― where, within each block, there is a hash that allows two blocks to be mathematically connected. If there is a block with an invalid hash, subsequent blocks linked to it will be invalid.
Of all the blocks placed on the chain, one in particular is very special: The genesis block, which is the first block on the blockchain mined by the source code owner. This block is the point of creation of the blockchain and is the only block actually issued by a centralized authority.
To understand more technically how the genesis block is inserted within the blockchain, we can analyze the very first version of Bitcoin. Here, we are interested in finding out how the blockchain is initialized and which block it inserts. The genesis block is inserted via the LoadBlockIndex
function. This comment is inserted within the code, probably to verify that it works properly:
//// debug
// Genesis Block:
// GetHash() = 0x000006b15d1327d67e971d1de9116bd60a3a01556c91b6ebaa416ebc0cfaa646
// hashPrevBlock = 0x0000000000000000000000000000000000000000000000000000000000000000
// hashMerkleRoot = 0x769a5e93fac273fd825da42d39ead975b5d712b2d50953f35a4fdebdec8083e3
// txNew.vin[0].scriptSig = 247422313
// txNew.vout[0].nValue = 10000
// txNew.vout[0].scriptPubKey = OP_CODESEPARATOR 0x31D18A083F381B4BDE37B649AACF8CD0AFD88C53A3587ECDB7FAF23D449C800AF1CE516199390BFE42991F10E7F5340F2A63449F0B639A7115C667E5D7B051D404 OP_CHECKSIG
// nTime = 1221069728
// nBits = 20
// nNonce = 141755
// CBlock(hashPrevBlock=000000, hashMerkleRoot=769a5e, nTime=1221069728, nBits=20, nNonce=141755, vtx=1)
// CTransaction(vin.size=1, vout.size=1, nLockTime=0)
// CTxIn(COutPoint(000000, -1), coinbase 04695dbf0e)
// CTxOut(nValue=10000, nSequence=4294967295, scriptPubKey=51b0, posNext=null)
// vMerkleTree: 769a5e
To make it easier for readers to understand, let’s identify the following fields and make note of some differences with the current, standard version of Bitcoin. The block definition is available in the main.h
file, from which we quote the section where the fields are defined.
class CBlock
{
public:
// header
uint256 hashPrevBlock;
uint256 hashMerkleRoot;
unsigned int nTime;
unsigned int nBits;
unsigned int nNonce;
// network and disk
vector<CTransaction> vtx;
// memory only
mutable vector<uint256> vMerkleTree;
// ...
}
The hash is a particular string that refers to the application of the hash function to the block header (hash
field). A hash function is a function that takes data of any size as input and produces a sequence of bits closely related to the input. From the sequence of bits, the original input cannot be easily traced back.
We know that each block can be divided into header (a global data set) and body (the main content of the block, i.e. transactions). The function sha256(sha256(header_block))
returns the hash 0x000006b15d1327d67e971d1de9116bd60a3a01556c91b6ebaa416ebc0cfaa646
. The meaning of the block hash field has not been changed in the current version of Bitcoin.
For the block queue to make sense, each block must contain a reference to the previous block (hashPrevBlock
field). Using a hash to refer to the previous block is very effective for two reasons.
The first lies in the fact that there is a verifiable mathematical link that allows a node to ignore any blocks which do not belong to the chain. The second, on the other hand, consists of optimization to search for a block within.
One of the most famous algorithms for searching within a data structure consists of hash tables. Simply put, we apply a hash function to each element and map each hash to a memory location known to us. To check whether an element exists or not, we simply access the memory location to which the hash was mapped. In constant time O(1), we can search for the block within the chain by the hash. Of course, the search for a block varies from project to project.
In the particular case of the genesis block, the hash of the previous block is set to 0 because the first block does not have any parent block. The field has the same meaning in the current version of Bitcoin.
This is not the case with Bitcoin, but this is a good chance to say a few words about genesis block verification. To understand whether a d block is the genesis block or not, it is mandatory to compare the hash of the d block with the hash of the genesis block (known a priori). It is advised not to go and check the condition d.hashPrevBlock === 0
. With that, there is no guarantee that a block can have a hash of 0 (or more dangerously, there is no guarantee that this is converted to 0 by some esoteric programming language.
The Merkle Tree is a data structure heavily used within Bitcoin. This data structure is part of the category of tree data structures composed of nodes. To get a better sense of this concept, assume we have a graph in which there are individual nodes connected to each other. Graphs habitually can develop in any direction, including vertical, horizontal, etc.
The tree graph is a particular graph that runs from top to bottom: At the top is the node from which the graph starts, while scrolling gradually from top to bottom we find the different nodes. We call nodes that are connected to the node x children of x. The node that is “above” the node of x is called the parent of x. Tree graphs have a root vertex and from this arise various arcs-“branches” in the tree analogy that connect the root to new vertices.
Each vertex can, in turn, have branches that originate and point to new vertices. The final vertices, with no outgoing branches, are called “leaves.” The Merkle Trees that Bitcoin uses are constructed from leaves. Each leaf contains the hash of a transaction (if odd, the last one is duplicated), where (one pair at a time) the contents of the leaves are concatenated and the hash function is applied to create a new vertex. The process is repeated layer by layer until only two vertices remain, which when concatenated and hashed create the root hash.
Therefore, to verify the data structure that contains the transactions, it is sufficient to check the hash of the root of the Merkle Tree, i.e. the hashMerkleRoot
field. The Merkle Tree is a compact way of representing that set of transactions ― it is used as a kind of checksum. It is checked that a new block with that exact root hash definitely has a transaction set which, manipulated in the Merkle Tree, returns the root hash.
Without the Merkle Tree, for each block, a Bitcoin node would be forced to verify N transactions. Assuming M are blocks and N are transactions, the time would be proportional to O(N * M)
, compared to using the Merkle Tree O(1)
. For this block, the value is 0x769a5e93fac273fd825da42d39ead975b5d712b2d50953f35a4fdebdec8083e3
.
The content of the block is represented by a set of transactions (txNew
field) and within each block (class CBlock
) is a vector of transactions called vtx
.
We summarize the fields for a transaction in the table below:
Field | Description |
---|---|
vin | Value In i.e., the inputs of the transaction. |
vout | Value Out i.e., the outputs of the transaction. |
nLockTime | Integer associated with the transaction: Indicates how much time should elapse between the block being validated and the transaction outputs that are able to be spent. |
The transaction in the first block is a coinbase transaction ― consisting of an input and an output. We recognize that it is a coinbase transaction (i.e. coin creation) from two key elements: The number of inputs is equal to 1 and the previous input is NULL
(this condition is explicit in Satoshi’s code). This transaction is sent to the person who mined it, that is, Satoshi; therefore obtaining the first Bitcoins. In this block, Satoshi refers to Bitcoin’s smaller denominations as “cent” (10,000) rather than “satoshis”.
Unfortunately, due to a problem also present in the first public version of Bitcoin, Satoshi could never spend the money from that transaction. As we can verify with the code, when inserting the genesis block within the blockchain, the developer should also insert the transaction into the data structure that contains all transactions. Satoshi, however, did not include the first transaction: The block therefore exists; however, the transaction does not exist for the system ― even though it remains included inside the genesis block.
In the output field, the amount for the first transaction is set to 10,000 (nValue field). A scriptPubKey
is also specified: This is a field that specifies a certain condition (in this case OP_CHECKSIG
). If this condition is true, then the transaction is valid and the amount can be spent. For a more in-depth overview of scriptPubKey, I recommend reading the “Scripts” section of the Bitcoin wiki.
Let’s take, for example, the construction of a normal transaction in which we want to use an input “A,” which is the output of a previous transaction. The previous transaction had specified, for “A,” a scriptpubkey, which in the simplest case contains a public key and a request to sign with that key (OP_CHECKSIG
). When building the transaction using “A” as the input, a scriptsig must be provided, which is the signed version of the transaction being built with the key specified by the previous sciptpubkey. The “transaction being built” is the transaction with all fields filled in except the scriptsigs, which must be empty. The peculiarity of the genesis block is that the scriptsig
field is completely arbitrary, as there was no previous transaction from which to take validation rules, so Satoshi could enter anything.
The name is rather self-descriptive. A timestamp field represents the number of seconds that have passed since the Unix Epoch (the date January 1, 1970). For this block, the timestamp value is 1221069728
and refers to Wednesday, September 10, 2008, at 18:02:08 (GMT).
In fact, the block appears to have been added on September 10, 2008. “Appears to have been added” because it is not certain that it was added to the chain on that very day. Rather, September 10, 2008 reminds us of a very important event.
September 10, 2008: Lehman Brothers and Q3 Results
On that day, the global investment bank Lehman Brothers published its third quarter results, posting an approximate $3.9 billion loss ― ultimately declaring bankruptcy five days later. Therefore, it is unclear whether indeed the timestamp is the real one or was artificially inserted. Satoshi is no stranger to including some clues that hark back to the 2008 crisis (hence, it is easy to see how Satoshi was against traditional payment systems).
The choice of such a date for the first block can only be a strange coincidence. For those interested, I retrieved some articles published by the UK Times and chose one in particular and decided to name the blockchain after the article. If I had been Satoshi, I would have chosen this one: “The Times 10/Sept/2008 Lehman sells property assets on $3.9bn loss”.
Note that this type of association is very “speculative” and there is no other evidence as to whether Satoshi intended to link this event to the blockchain. However, the coincidence remains curious.
The nBits
field is the only field that has a substantial difference with the current version of Bitcoin. In the standard version (the current one) it is the target section: The hash of the block header must be less than or equal for the block to be accepted by the network. The lower the value of the target field, the harder it is to mine the block.
In the very early version, the nBits
field was about mining, but it represents the minimum amount of “work” that can be done before a block is accepted. It seems to be an inversion of the meaning since, as mentioned above, it represents the minimum amount of work that must be done before it can be accepted by the network. Simply put, the hash produced by mining (in addition to being valid) must be greater than nBits
. In fact, the value of the nBits
field is equal to the constant declared in the main.h
file called MINPROOFOFWORK
with the comment “ridiculously easy for testing.”
The nonce
field is an arbitrary number chosen by the miner that is used to meet the constraint on the hash. In fact, the hash of the block must begin with a number of zeros. The nonce
field is the same one used in the current version of Bitcoin.
Below is the code given in main.cpp
to insert the genesis block inside the blockchain.
CTransaction txNew;
txNew.vin.resize(1);
txNew.vout.resize(1);
txNew.vin[0].scriptSig = CScript() << 247422313;
txNew.vout[0].nValue = 10000;
txNew.vout[0].scriptPubKey = CScript() << OP_CODESEPARATOR << CBigNum("0x31D18A083F381B4BDE37B649AACF8CD0AFD88C53A3587ECDB7FAF23D449C800AF1CE516199390BFE42991F10E7F5340F2A63449F0B639A7115C667E5D7B051D404") << OP_CHECKSIG;
CBlock block;
block.vtx.push_back(txNew);
block.hashPrevBlock = 0;
block.hashMerkleRoot = block.BuildMerkleTree();
block.nTime = 1221069728;
block.nBits = 20;
block.nNonce = 141755;
We thus discover that the first block mined by Satoshi is the genesis block of an alternative network probably used for testing purposes. The hash of the block is: 0x000006b15d1327d67e971d1de9116bd60a3a01556c91b6ebaa416ebc0cfaa646
.
This is the first block (also referred to as the “alternate block” or “pre-genesis block”) that Satoshi placed within his blockchain. Indeed, recall that this block is not carried over anywhere else. If it were not for the source code released on the BitcoinTalk forum, no one would know the story behind this particular hash. The blockchain from which Bitcoin officially starts does not, in fact, contain this pre-genesis block, but rather another one that includes the famous phrase “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks” from which it all started.
However, why not include this pre-genesis block within the Bitcoin blockchain? Satoshi had to act with maximum transparency. It was not permissible for the creator of the project to start with a blockchain already composed of blocks (with transactions). Creating a new genesis block that could have originated on the code release date made sense.
Compared to the version currently in use, we do not find the version
field indicating the version of the block. The lack of this field seems to cause a problem as confirmed by a comment inside the ConnectBlock
method: //// issue here: it doesn't know the version
. Wanting to be nitpicky and mischievous, this is one of the few comments to use four slashes instead of two. The other comment that arouses curiosity is this one: //// is this all we want to do if there's a file error like this?
Who was Satoshi referring to when speaking in the plural?
Had Satoshi actually been referring to a group of people with whom there was collaboration? Could Satoshi perhaps have been referring to the early reviewers on the Metzdowd mailing list? Satoshi, however, is no stranger to introducing a red herring, i.e., items that could lead to false leads. This comment along with the others could represent a red herring.
The identity behind Satoshi has always been a mystery since the beginning of the Bitcoin story. Several speculators have pointed fingers at prominent figures in the fields of computing and economics. From global figures (e.g. Elon Musk) to fanatics, it is clear that this puzzle excites many people.
Stylometric analyses were performed on texts produced by Satoshi, including messages posted within the BitcoinTalk forum, e-mails and the white paper, the main text of Satoshi’s production. In addition, to further this analysis, information has been gathered on the first people who orbited close to Satoshi (e.g. Hal Finney, Nick Szabo).
Over the years, characters such as Craig Wright have come forward claiming to be Satoshi. Of course, the only way to verify whether a person is really Satoshi is through PGP keys. If a person manages to sign a message with Satoshi’s private key, there are two possibilities: The person is really Satoshi or the private key has been stolen.
Some have ventured that a group of people may be behind Satoshi, which would explain why stylometric analysis has failed (or perhaps why there has been much bias in the studies). Within the Bitcoin community, there are some studies that refute or confirm the link between a person and Satoshi; however, these studies analyze a rather restrictive dataset and a somewhat limited number of people.
However, it is possible to reveal a very important detail about Satoshi Nakamoto’s personality. The fact that Satoshi chose the genesis block with a certain timestamp (which included the sentence in the UK Times article) makes one realize how much preparation and meticulousness was behind Satoshi’s identity. Satoshi did not want the source code preview to be published. The desk preparation of the coin, some of the comments identified within the early SVN commits, and the perfectionism Satoshi was aiming for might suggest a figure created “ad hoc” by a group of people. This all, of course, remains only speculation ― to actually attribute an identity to Satoshi would require more effort on the part of the community, sourcing as much information as possible from any, and every, conversation.
As always, if you have any additional information or if you have spotted any errors, please contact me via email (PGP key). If you like this post let me know. In the future, I may continue with more analysis on the source code of the very first version of Bitcoin.