Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12662

lost+found strategy for bad/corrupt blocks to improvate data replication 'SLA' for small clusters

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.8.1
    • Fix Version/s: None
    • Component/s: block placement
    • Labels:
      None

      Description

      Corrupt blocks currently need to be removed manually and effectively block the data node on which they reside from receiving a good copy of the same block. In small clusters (ie. node count == replication factor), this prevents the name node from finding a free data node to keep the desired replication level up until the user manually runs some fsck command to remove the corrupt block.

      I suggest moving the corrupt block out of the way, like it's usually done by ext2-based filesystems, ie. move the block to /lost+found directory, such that the name node can replace it immediately. Or maybe simply add a configuration option that allows to fix corrupt blocks in-place because harddisks usually internally replace bad sectors on their own and a simple rewrite can often fix those issues.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gruust Gruust
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: