Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12662

lost+found strategy for bad/corrupt blocks to improvate data replication 'SLA' for small clusters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.8.1
    • None
    • block placement
    • None

    Description

      Corrupt blocks currently need to be removed manually and effectively block the data node on which they reside from receiving a good copy of the same block. In small clusters (ie. node count == replication factor), this prevents the name node from finding a free data node to keep the desired replication level up until the user manually runs some fsck command to remove the corrupt block.

      I suggest moving the corrupt block out of the way, like it's usually done by ext2-based filesystems, ie. move the block to /lost+found directory, such that the name node can replace it immediately. Or maybe simply add a configuration option that allows to fix corrupt blocks in-place because harddisks usually internally replace bad sectors on their own and a simple rewrite can often fix those issues.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gruust Gruust
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: