Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13709

Report bad block to NN when transfer block encounter EIO exception

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0, 3.2.1, 3.1.3
    • Component/s: datanode
    • Labels:
      None

      Description

      In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes disk bad track may cause data loss.

      For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs on A's replica data, and someday B and C crushed at the same time, NN will try to replicate data from A but failed, this block is corrupt now but no one knows, because NN think there is at least 1 healthy replica and it keep trying to replicate it.

      When reading a replica which have data on bad track, OS will return an EIO error, if DN reports the bad block as soon as it got an EIO,  we can find this case ASAP and try to avoid data loss

        Attachments

        1. HDFS-13709.002.patch
          12 kB
          Chen Zhang
        2. HDFS-13709.003.patch
          12 kB
          Chen Zhang
        3. HDFS-13709.004.patch
          14 kB
          Chen Zhang
        4. HDFS-13709.005.patch
          15 kB
          Chen Zhang
        5. HDFS-13709.patch
          10 kB
          Chen Zhang

          Issue Links

            Activity

              People

              • Assignee:
                zhangchen Chen Zhang
                Reporter:
                zhangchen Chen Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: