Description
After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster.
But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving addBlock() was accessing stale parent value. Although nulling out parent is done inside the FSNamesystem and FSDirectory write lock, there is no memory barrier because there is no "synchronized" block involved in the process.
I suggest making parent volatile.
Attachments
Attachments
Issue Links
- is depended upon by
-
HDFS-6647 Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
- Closed