Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8881

Erasure Coding: internal blocks got missed and got over-replicated at the same time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • erasure-coding
    • None

    Description

      We know the Repl checking depends on BlockManager#countNodes(), but countNodes() has limitation for striped blockGroup.

      One missing internal block will be catched by Repl checking, and handled by ReplicationMonitor.
      One over-replicated internal block will be catched by Repl checking, and handled by processOverReplicatedBlocks.
      One missing internal block and two over-replicated internal blocks at the same time will be catched by Repl checking, and handled by processOverReplicatedBlocks, later by ReplicationMonitor.
      One missing internal block and One over-replicated internal block at the same time will NOT be catched by Repl checking.

      "at the same time" means one missing internal block can't be recovered, and one internal block got over-replicated anyway. For example:

      scenario A:
      step 1. block #0 and #1 are reported missing.
      2. a new #1 got recovered.
      3. the old #1 come back, and the recovery work for #0 failed.

      scenario B:
      1. An DN decommissioned/dead which has #1.
      2. block #0 is reported missing.
      3. The DN has #1 recommisioned, and the recovery work for #0 failed.

      In the end, the blockGroup has [1, 1, 2, 3, 4, 5, 6, 7, 8], assume 6+3 schema. Client always needs to decode #0 if the blockGroup doesn't get handled.

      Attachments

        1. HDFS-8881.00.patch
          6 kB
          Walter Su

        Issue Links

          Activity

            People

              walter.k.su Walter Su
              walter.k.su Walter Su
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: