Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8031 Follow-on work for erasure coding phase I (striping layout)
  3. HDFS-9837

BlockManager#countNodes should be able to detect duplicated internal blocks

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha1
    • 3.0.0-alpha1
    • None
    • None
    • Reviewed

    Description

      Currently BlockManager#countNodes only counts the number of replicas/internal blocks thus it cannot detect the under-replicated scenario where a striped EC block has 9 internal blocks but contains duplicated data/parity blocks. E.g., b8 is missing while 2 b0 exist:
      b0, b1, b2, b3, b4, b5, b6, b7, b0

      If the NameNode keeps running, NN is able to detect the duplication of b0 and will put the block into the excess map. countNodes excludes internal blocks captured in the excess map thus can return the correct number of live replicas. However, if NN restarts before sending out the reconstruction command, the missing internal block cannot be detected anymore. The following steps can reproduce the issue:

      1. create an EC file
      2. kill DN1 and wait for the reconstruction to happen
      3. start DN1 again
      4. kill DN2 and restart NN immediately

      Attachments

        1. HDFS-9837.004.patch
          32 kB
          Jing Zhao
        2. HDFS-9837.003.patch
          33 kB
          Jing Zhao
        3. HDFS-9837.002.patch
          33 kB
          Jing Zhao
        4. HDFS-9837.001.patch
          25 kB
          Jing Zhao
        5. HDFS-9837.000.patch
          16 kB
          Jing Zhao

        Activity

          People

            jingzhao Jing Zhao
            jingzhao Jing Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: