Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15187

CORRUPT replica mismatch between namenodes after failover

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The corrupt replica identified by Active Namenode, isn't identified by the Other Namenode, when it is failovered to Active, in case the replica is being marked corrupt due to updatePipeline.

      Scenario to repro :
      1. Create a file, while writing turn one datanode down, to trigger update pipeline.
      2. Write some more data.
      3. Close the file.
      4. Turn on the shutdown datanode.
      5. The replica in the datanode will be identifed as CORRUPT and the corrupt count will be 1.
      6. Failover to other Namenode.
      7. Wait for all pending IBR processing.
      8. The corrupt count will not be same, and the FSCK won't show the corrupt replica.
      9. Failover back to first namenode.
      10. Corrupt count and corrupt replica will be there.

      Both Namenodes shows different stuff.

        Attachments

        1. HDFS-15187-01.patch
          8 kB
          Ayush Saxena
        2. HDFS-15187-02.patch
          8 kB
          Ayush Saxena
        3. HDFS-15187-03.patch
          8 kB
          Ayush Saxena

          Activity

            People

            • Assignee:
              ayushtkn Ayush Saxena
              Reporter:
              ayushtkn Ayush Saxena
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: