Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11179

LightWeightHashSet can't remove blocks correctly which have a large number blockId

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.0.0-alpha1
    • None
    • namenode
    • None

    Description

      Our test cluster has faced a problem that postponedMisreplicatedBlocksCount has been going below zero. The version of the cluster is a recent 3.0. We haven't created any EC files yet. This is the NN's log:

      Rescan of postponedMisreplicatedBlocks completed in 13 msecs. 448 blocks are left. 176 blocks are removed.
      Rescan of postponedMisreplicatedBlocks completed in 13 msecs. 272 blocks are left. 176 blocks are removed.
      Rescan of postponedMisreplicatedBlocks completed in 14 msecs. 96 blocks are left. 176 blocks are removed.
      Rescan of postponedMisreplicatedBlocks completed in 327 msecs. -77 blocks are left. 177 blocks are removed.
      Rescan of postponedMisreplicatedBlocks completed in 15 msecs. -253 blocks are left. 179 blocks are removed.
      Rescan of postponedMisreplicatedBlocks completed in 14 msecs. -432 blocks are left. 179 blocks are removed.
      

      I looked into this issue and found that it is caused by LightWeightHashSet which is used for postponedMisreplicatedBlocks recently. When LightWeightHashSet remove blocks which have a large number blockId, overflows happen and the blocks can't be removed correctly(, let alone ec blocks whose blockId starts with the minimum of long).

      Attachments

        Issue Links

          Activity

            People

              tasanuma Takanobu Asanuma
              tasanuma Takanobu Asanuma
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: