Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.15.1
-
None
-
None
Description
If the name node crashes after blocks have been allocated and before the content has been uploaded, fsck will report the zero sized files as corrupt upon restart:
/user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1 blocks of total size 0 B
... even though all blocks are accounted for:
Status: CORRUPT
Total size: 2932802658847 B
Total blocks: 26603 (avg. block size 110243305 B)
Total dirs: 419
Total files: 5031
Over-replicated blocks: 197 (0.740518 %)
Under-replicated blocks: 0 (0.0 %)
Target replication factor: 3
Real replication factor: 3.0074053
The filesystem under path '/' is CORRUPT
In UFS and related filesystems, such files would get put into lost+found after an fsck and the filesystem would return back to normal. It would be super if HDFS could do a similar thing. Perhaps if all of the nodes stored in the name node's 'includes' file have reported in, HDFS could automatically run a fsck and store these not-necessarily-broken files in something like lost+found.
Files that are actually missing blocks, however, should not be touched.
Attachments
Attachments
Issue Links
- relates to
-
HADOOP-2703 New files under lease (before close) still shows up as MISSING files/blocks in fsck
- Closed