Ethereum can be confusing. The mere question of how large the blockchain is – 100 gigabytes or one terabyte – presents many people with a challenge. It is equally difficult to define what a full node is and to explain how to read the blockchain.
We try to bring a little light into the darkness with the help of Afri Schoedon from Parity.
Ethereum is different from Bitcoin and often overtaxes people you should think of as experts. Nothing shows this as clearly as a short episode in the constant propaganda war of the cryptoscene.
And someone with the pseudonym “StopAndDecrypt” published an article about Ethereum, which has the title that the blockchain is now larger than a terabyte. “StopAndDecrypt” is, one must know, moderator of r/Bitcoin – some call it “censor” – and appears in many other social media as a vehement advocate for the Bitcoin Core cause. His article concludes, not surprisingly, that Ethereum is doomed because of the large blockchain and no currency other than Bitcoin Core is viable. Oh, miracles.
Most in the Ethereum scene have commented relatively politely on the article: It is absurdly wrong in detail, but hits one or two points. The blockchain is by no means a terabyte in size – but the high data load is a problem for Ethereum.
The topic has a rat’s tail of questions: How big is the blockchain really? How do you get to think she’s a terabyte in size? Why do Ethereum nodes have so many different modes to synchronize? And how much does the amount of data actually load the network? Afri Schoedon, release manager of Parity, an Ethereum node software, answered these and other questions. The result was an odyssey in five stages.
I. The Blockchain of Ethereum is not even half as big as that of Bitcoin
The size of a blockchain is easy to determine: It is the sum of all blocks. It’s pretty straightforward.
“The blockchain of Ethereum contains all transactions. That is currently about 70 to 80 gigabytes. Once you download it, you can calculate everything that ever happened,” Afri explains. The Ethereum blockchain is not – as claimed by StopAndDecrypt – larger than one terabyte, but in reality not even half as large as that of Bitcoin, which currently weighs in at 170 gigabytes.
According to Etherstats.io statistics, there is a new block about every 15 seconds with a maximum size of 25 kilobytes. This means that the blockchain grows by about one megabyte every ten minutes – the same rate as Bitcoin.
So far everything is manageable.
II. Synchronize a Node
The first synchronization of a node is also relatively easy to explain: With both Bitcoin and Ethereum, the node downloads the blockchain, block by block, and calculates a state from it. Bitcoin calls this set UTXO, Ethereum State.
The UTXO set is just a list of coins that have not yet been issued. To build it, the node must calculate it block by block: Each transaction destroys one coin and creates a new one. In this way, a node updates the UTXO set from the Genesis block to the present.
The state of Ethereum is different. More complicated. “It contains the balance or state of, among other things, every smart contract,” Afri explains. Smart contracts are like algorithms in a computer program, and their state is the result. For a token contract, it can be a list of account balances, for the DAO the results of reconciliations. And so on.
But the state is built in the same way as a UTXO set: block by block, until you have arrived in the present. When an Ethereum node synchronizes, it recalculates the state with each block. The interim results – the historical states – “are only important for calculating the new state. Usually you throw them away. “There are hardly any applications where you need an old state.”
This is exactly what a “normal” Ethereum node does: It downloads the blockchain, calculates a state for each block in memory, but only stores the current one. Depending on the system, it takes 12 hours to a week before it is finished. According to Afri, the performance secret for synchronization is “as large a cache and as much working memory as possible”.
III. How does the node know the history of an address?
Since a node only stores the blockchain and the final state of both Bitcoin and Ethereum, it is not quite trivial to know the history: How does the node know which transactions were executed with or to an address if they are no longer part of the current state?
The simplest variant is if the node “experiences” it: if it already knows the address and experiences a change of state through a new block. Then he knows what to look for and stores the relevant information in the wallet file. But what if you don’t have the addresses yet, but import them later? This question shows the differences between a node at Bitcoin and Ethereum.
At Bitcoin there are two possibilities: First, the node reindexes the database. This happens when you enter a private key, for example. The program then searches the blockchain for the corresponding transactions. A few minutes later it has the history of the address. Second, you can start the node with the flag “-txindex=1”. It then writes a list of transactions during synchronization. This requires a few gigabytes of additional memory.
At Ethereum, both are possible – theoretically. In practice, it is made more difficult by the fact that there is no UTXO set, only “accounts”. You can imagine that a credit at Bitcoin is a bucket full of coins, each stamped with its origin, while at Ethereum it is a bucket of water, from which only the level can be seen. This level can be changed by normal transactions, but also by executing Smart Contracts. This makes it more difficult to find past events in the blockchain.
Theoretically it is possible to reconstruct the history of accounts on demand. But since no Ethereum client has this option, you have to do it manually through the APIs. It is also easier: “You can also see old transaction paths by activating the’-tracing on’ option when synchronizing,” says Afri, “then the node stores the execution results of the transactions, which requires about 30-50% more memory”.
IV. Why it is so difficult to trace the History of a Smart Contract
With tracing, an Ethernet node can look up the results of historical transactions. If we were talking about Bitcoin, almost everything would be said. But we’re talking about Ethereum.
Afri explains that tracing cannot reconstruct every state of Smart Contracts. What state had an ICO contract after 1,200 people participated? What was the condition of contracts like CryptoKitties for a certain block? It is very difficult to check this individually, especially if several Smart Contracts interact with each other. “To find out the state of the art, it’s not enough just to look at the transactions.”
Of course, you could selectively form the states you need. But Afri says that no client has implemented this yet, and it is very difficult. Instead, you can synchronize a node as a “Full Archival Node” (FAN): the node does not throw away the old states, but stores them on the hard disk. This takes quite a long time and requires an enormous amount of memory. One can imagine a company recalculating and printing the complete balance sheet after every transaction – every sale of goods, every salary payment made.
For normal users, Afri thinks, this is unnecessary. “You only need it to retroactively track the state of Smart Contracts. “I always say only scientists, criminal investigators and block explorers need that.” If you have such a FAN, you can query every historical change in status of all Smart Contracts in real time.
V. A Terabyte of devouring Monsterode
A FAN now needs more than a terabyte of memory. Afri estimates that demand will rise to just under two terabytes by the end of the year, and if Ethereum continues to grow as it currently does, a FAN will need eight or 16 terabytes in the foreseeable future.
For hobby users, a FAN is already too much. First, because it takes forever and two days to load it, and second, because an SSD hard drive of the necessary size is quite expensive. Since a FAN performs an extremely large number of read and write operations, wear is high. The hard disk should be changed at least once a year.
Afri, however, is not worried that the FANs will die out. “The worst-case scenario is that it will become more expensive for Blockexplorer. “There’s always gonna be someone with the resources to let one of those things go.” What are two 16 terabyte hard drives a year for a lucrative company? The first 30 terabyte hard disks are already ready for the market, 100 terabyte SSD hard disks are about to be launched. The requirements are high, but feasible.
Afri is therefore more concerned about the normal full nodes without the archive of the historical states.
VI. Decentralization
At Ethereum, a full node currently requires about 100 gigabytes of memory. This is significantly less than a Bitcoin Full Node. However, a full node at Ethereum consumes much more CPU power and memory than a bitcoin node and takes longer to synchronize. The state is more complex than the UTXO set.
But there’s nothing wrong at the moment. A shrinkage of nodes is not detectable. With about 17,000 open-port nodes, Ethereum has almost twice as many nodes as Bitcoin. So there is no reason to worry yet.
However, if Ethereum continues to grow as before, the size of a full node will grow to more than 500 gigabytes in the foreseeable future. According to Afri, this could be a pain threshold above which many hobbyists give up their full nodes. The pain is not only caused by memory requirements, but also by other factors such as synchronization, CPU load or hard disk operations. A full node is already noticeably slowing down many computers.
The alternative for users can be “warp nodes”. These are synchronized with the option “-warp” in case of parity. A warp node only downloads the state from the other nodes and a certain number of subsequent blocks, 30,000 by default. “Thanks to State-Trie-Root Hash in the block header you can’t just get a wrong state. It’s almost impossible to fake.” For the amounts that normal mortals receive and send, such a node is more than sufficiently secure. In addition, you can further increase security by changing peers and requesting information from various block explorers.
As with Bitcoin, full nodes in private hands fulfil the function of a control organ at Ethereum. If they are abolished, the decentralisation of Ethereum threatens to break and the crypto currency could become a pawn in cartel interests. However, when this happens, and how bad it really is, is a question that can be disputed for a long time. However, it would be better if it were not necessary because private full nodes still exist.
VII Future
The goal of the Ethereum developers is therefore to keep the full nodes small enough that they can be operated by hobby users, but to create enough capacity not to block growth. This is the squaring of the circle, or, as Afri says, “the holy grail of the blockchain.”
It is “a technical problem to which there are many theoretical and some practical solutions”: First, sidechains, i.e. connecting other chains via bridges. That already exists, with Loom and the POA Network, but is still in its infancy. Second, state channels like raiding, which means that transactions are executed off-chain. This already exists as a prototype, but is not yet as far as the lighting network at Bitcoin. Thirdly, we are working on sharding, which means splitting the chain.
All these solutions are still a few years away, says Afri. He is optimistic that Ethereum will continue to exist for so long, but he also believes that it could become scarce. “We don’t want only large companies to be able to operate a node. There are currently no bottlenecks, but it could become critical in the next few years.”