Details
Description
1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly.
2. One of the DataNode got diskfull ( due to some other data filled up disks)
3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed.
4. After some time disk is made free and all processes are restarted.
5. Now HMaster try to recover the file by calling recoverLease.
At this time recovery was failing saying file length mismatch.
When checked,
actual block file length: 62484480
Calculated block length: 62455808
This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size.
No matter how many times, recovery was continously failing.