Blog

TomoChain Client v1.4 release announcement.

This release includes the following chaindata size and security important improvements:

  1. Chain-data pruning via the new “gc” (garbage collector) utility. Details are explained below.
  2. Security bug fix to avoid malicious masternodes panicking the chain

The security bug, reported from our security bug bounty program, consisted in a malicious masternode being able to craft a special empty message during the randomization procedure that would lead to all the masternodes panicking. It was fixed in this pull request.

3. Checkpoint timeout adjusted from 2 minutes to 20 seconds. The checkpoint timeout adjustment was made by this pull request.


In the next sections, we would like to explain how data is stored on the TomoChain network, what impacts the chaindata size growth and how we introduced a “garbage collector” (gc) utility to remove the excessive historical data stored on the “BlockSigner” smart contract.

Modified Merkle Patricia Trie

In TomoChain, there are four merkle tries that maintain all data of blockchain including storage trie, state trie, transactions trie and receipts trie. All of the merkle tries in TomoChain use a modified Merkle Patricia trie. (MPT or trie)

Basically, MPT is a combination of Patricia trie and Merkle tree. Patricia trie is a tree-based data structure that stores (key, value) bindings in which a key represents a path so the nodes in tree that share the same prefix can also share the same path. Advantage of this structure is that it’s fast at finding common prefixes and doesn’t take much memory to store data. Merkle tree is a tree of hashes in which leaf nodes store data and parent nodes contain their children’s hash as well as the hashed value of the sum of their children’s hashes. Merkle tree makes easy to find whether two different nodes have the same data or not by comparing the top hash value of those two nodes.

Merkle Patricia trie in TomoChain provides a cryptographically authenticated data structure that can be used to store all (key, value) bindings. The most important feature of Merkle Patricia trie is that it is fully deterministic, meaning that a Patricia trie with the same (key, value) binding is guaranteed to be exactly the same and therefore have the same root hash. Thus, it’s fast for insertions, lookup and deletions.

TomoChain’s state trie and BlockSigner smart contract

From TomoChain’s block header, there are three roots from three tries: stateRoot (state trie), transactionsRoot (transactions trie) and receiptsRoot (receipts trie).

State trie contains global state of TomoChain blockchain; and it is updated over time by executing transactions. State trie contains paths constructed by SHA3[TomoChain address] and values constructed by RLP[TomoChain account]. Note that a TomoChain account is a 4 items array of [nonce, balance, storageRoot, codeHash]. StorageRoot is the root of storage trie mentioned above.

TomoChain has several built-in smart contracts; one of them is called BlockSigner (address: 0x0000000000000000000000000000000000000089), containing masternodes signing data. A masternode validates a block by sending an internal signing transaction to BlockSigner smart contract. At the end of each epoch, TomoChain’s consensus calculates signing counts in that smart contract to reward masternodes accordingly. The picture below figures data stored in BlockSigner:

TomoChain’s state trie and BlockSigner internal state trie
  • BlockSigner is stored in a node of the global TomoChain’s state trie at the specific address: 0x00…0089.
  • That node contains an internal trie at it’s data root in which each node contains masternode that validates a specific block.
  • For example in this picture, three masternodes signed a block collected by its hash on three separate nodes.

We figured out that the signing data was the main source of chaindata growth. Thus at the previous release v1.3 we stopped using BlockSigner smart contract to store signing data; instead we counted directly compressed signing transactions on every 15 blocks. Because masternode sends signing transaction asynchronously, we scan blocks on the frame of two recent epochs which is equivalent to 1800 blocks. Furthermore, we cache signing transactions on memory to accelerate the reward calculation at epoch block so that the checkpoint time (execution time at epoch block) is also reduced significantly from a maximum of 2 minutes down to 20 seconds.

Introducing gc (garbage collector)

In this release, we introduce the gc utility which aims to clean historical BlockSigner state trie and BlockSigner addresses in TomoChain global state root. We ran it already on TomoChains’ masternodes and the result was significant. Chaindata size is reduced from 300+ GB down to 72GB (recorded on April 23rd, 2019). However, be aware that this process is slow and the node has to be stopped while running gc as it scans every blocks starting from the genesis. We ran gc on a powerful server (32 cores CPU and 96GB RAM) and it took 6 days to finish. To make it easier for masternode operators, we will publish the pruned chaindata from our masternodes for a quick use on any nodes.

You can choose to run gc on your own chaindata by the following commands:

$ git clone https://github.com/tomochain/tomochain && \
cd tomochain && \
git checkout tags/v1.4
$ make gc
$ ./build/bin/gc -dir path/to/your/chaindata