While investigating this I've realised that the concurrency issues go beyond just the head tracker, and that "fixing the head tracker" alone is not feasible or desirable.
The two main places where we update the head tracker are:
- When importing blocks (
import_block) -- we add the new block to the head tracker and remove its parent (which may or may not be present, the parent could be an older non-head block):
- When pruning (
prune_abandoned_forks) -- we delete all the heads that were pruned from the database.
The issue Adam highlighted is definitely a problem, a beacon node could interleave updating a head (via block import) with pruning heads, which could lead to a head in the head tracker that has had its parent (and abandoned ancestors) pruned from the database. The real issue is actually concurrent database access, as the following possible execution shows:
- Start processing a block
B2that has parent
- Before processing of
B2completes, pruning deletes
B1and its associated data from the on-disk database.
- Processing of
B2completes, which results in
B2and its state being stored in the database.
End result: the invariant that a block being in the DB implies its parent is also in the DB is violated (because
B2 is in the DB but
B1 isn't). The atomicity of the individual writes of (2) and (3) does nothing to help us, because the writes are only related semantically by this invariant, and don't actually conflict.
Using a Big Lock
A simple solution would be to wrap block importing and pruning in one big lock, so that interleaving of their internal steps becomes impossible, something like:
import_block: - Lock A - Write new block to database - Write block to head tracker - Unlock A prune_abandoned_heads: - Lock A - Delete abandoned heads from database - Delete abandoned heads from the head tracker - Unlock A
This prevents consistency issues by making the execution trace above impossible. However, it is very coarse-grained and involves substantial risk: no blocks can be imported whilst pruning runs. If pruning runs for an extended period, the node could end up stalled, unable to import new blocks while it waits for pruning to complete.
The coarse-grained locking is the issue here, and prohibits perfectly safe executions like this:
- Start processing a block on the canonical chain (or some viable fork),
- Start pruning the database, deleting lots of non-viable forks, but no data related to
- Finish processing
B, store it in the database
I thought a bit about a data-structure for head-tracking that allows each head to be locked/mutated/deleted individually, but this still doesn't solve the database consistency issue.
The database consistency issue might be best addressed by using a read-write transaction that fails if the data it read has been deleted by the time it commits. This sort of transaction isn't available in LevelDB, but might be in LMDB/RocksDB (which begs the question of #483). Another complication is caching, if the transaction doesn't need to read the parent block/state because it is cached, then it might not conflict "correctly" (but this is easy to work around I think). More research and thought is required, this comment is getting long enough... TBC