Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6964

NN fails to fix under replication leading to data loss

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • 2.0.0-alpha, 3.0.0-alpha1
    • None
    • namenode
    • None

    Description

      We've encountered lost blocks due to node failure even when there is ample time to fix the under-replication.

      2 nodes were lost. The 3rd node with the last remaining replicas averaged 1 copy block per heartbeat (3s) until ~7h later when that node was lost resulting in over 50 lost blocks. When the node was restarted and sent its BR the NN immediately began fixing the replication.

      In another data loss event, over 150 blocks were lost due to node failure but the timing of the node loss is not known so there may have been inadequate time to fix the under-replication unlike the first case.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              daryn Daryn Sharp
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: