Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15187

CORRUPT replica mismatch between namenodes after failover

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.3
    • None
    • None
    • Reviewed

    Description

      The corrupt replica identified by Active Namenode, isn't identified by the Other Namenode, when it is failovered to Active, in case the replica is being marked corrupt due to updatePipeline.

      Scenario to repro :
      1. Create a file, while writing turn one datanode down, to trigger update pipeline.
      2. Write some more data.
      3. Close the file.
      4. Turn on the shutdown datanode.
      5. The replica in the datanode will be identifed as CORRUPT and the corrupt count will be 1.
      6. Failover to other Namenode.
      7. Wait for all pending IBR processing.
      8. The corrupt count will not be same, and the FSCK won't show the corrupt replica.
      9. Failover back to first namenode.
      10. Corrupt count and corrupt replica will be there.

      Both Namenodes shows different stuff.

      Attachments

        1. HDFS-15187-03.patch
          8 kB
          Ayush Saxena
        2. HDFS-15187-02.patch
          8 kB
          Ayush Saxena
        3. HDFS-15187-01.patch
          8 kB
          Ayush Saxena

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ayushtkn Ayush Saxena
            ayushtkn Ayush Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment