Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13709

Report bad block to NN when transfer block encounter EIO exception

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.1, 3.1.3
    • datanode
    • None

    Description

      In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes disk bad track may cause data loss.

      For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs on A's replica data, and someday B and C crushed at the same time, NN will try to replicate data from A but failed, this block is corrupt now but no one knows, because NN think there is at least 1 healthy replica and it keep trying to replicate it.

      When reading a replica which have data on bad track, OS will return an EIO error, if DN reports the bad block as soon as it got an EIO,  we can find this case ASAP and try to avoid data loss

      Attachments

        1. HDFS-13709.patch
          10 kB
          Chen Zhang
        2. HDFS-13709.005.patch
          15 kB
          Chen Zhang
        3. HDFS-13709.004.patch
          14 kB
          Chen Zhang
        4. HDFS-13709.003.patch
          12 kB
          Chen Zhang
        5. HDFS-13709.002.patch
          12 kB
          Chen Zhang

        Issue Links

          Activity

            People

              zhangchen Chen Zhang
              zhangchen Chen Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: