When trying to read a file that is corrupt on HDFS (registered by the namenode, but part of the data is missing on the datanodes), some of the assertions in dfs_read fail, causing the program to abort. This makes it impossible to access the mounted partition until it is mounted again.
A simple way to reproduce this bug is to remove enough datanodes to have part of the data missing, and to read each file listed in HDFS.
this is the assertion that fails (fuse_dfs.c:903) : assert(bufferReadIndex >= 0 && bufferReadIndex < fh->bufferSize);
The expected behaviour would be to return either no file or a corrupt file, but continue working afterward.
removing the assertion seems to work for now, but a special behaviour is probably needed to handle this particular problem correctly.
- is related to
HADOOP-4635 Memory leak ?