Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15304

Infinite loop between DN and NN at rare condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      During the investigation lead to HDFS-15303, we have identified the following infinite loop between the DNs affected by the data directory layout problem:

      • for a particular misplaced block, the VolumeScanner finds the block file, and realizes that it is not part of the block map
      • the block is added to the block map
      • at the next FBR the block is reported to the NN
      • the NN finds that the block should have been deleted already, as the corresponding inode was already deleted
      • NN issues the deletion of the block on the DataNode
      • DataNode runs the delete routine, but that fails to delete anything silently as it is trying to delete the block from the wrong internal subdir that is calculated based on the block id with a different algorythm.
      • block is removed from the blockmap
      • VolumeScanner finds the block again, and adds it back to the blockmap

      The problem can happen only when there is a mixed layout on the DataNode due to some issue, and there are blocks in a subdir correct according to Hadoop2 format, but the DN is already hadoop3, or vice versa if the problematic layout born during a rollback.

      Attachments

        Activity

          People

            pifta István Fajth
            pifta István Fajth
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: