This one is a bear. I believe what is happening is the following:
Your code is creating and deleting large numbers of sequential nodes. At time T, it is in the process of deleting a bunch of nodes when ZK decides to take a snapshot of the state.
When we take a snapshot, we spawn a separate thread and serialize the nodes of the tree in that thread. We get into your /zkrsm node in DataTree.serializeNode, get that node from the tree, synchronize on it, and write out the record of that node including its current cversion (used to generate sequential node information) and the list of children. However, we then release the sync on that node, and attempt to iterate through the children to serialize them out. In the meantime, the other thread is merrily deleting children of this node, increasing the cversion of /zkrsm all the while. So the list of children that we got while serializing the parent is defunct. When we try to serialize these now-deleted children, we see that they are null and continue on.
Now, you finish this snapshot, delete some more nodes under /zkrsm, create some more sequential nodes under /zkrsm, and crash. When you start back up again, you read that snapshot and start playing through the log transactions after the snapshot zxid. Unfotunately, the first N transactions in your log after the snapshot zxid are deletions of nodes that didn't make it into the snapshot because you deleted them before they could be serialized to the snapshot. We will try to process the delete transaction and get a NoNodeException, but ignore it because we know that can happen due to what I wrote above. But what we don't do is increment the cversion of the parent node after this failed deletion. So our parent's cversion is less than the version it would be if you played just the transaction log through, or of the system before the crash. Now you want to continue creating sequential nodes where you left off, but your cversion is wrong so you try to create a node that already exists. Whoops.
So, now we just need to fix it. Should we be incrementing the cversion of the parent even on a NoNode exception during txn log replay? I suspect that is the right thing to do. Thoughts?