Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-5225

datanode keeps logging the same 'is no longer in the dataset' message over and over again

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Duplicate
    • Affects Version/s: 2.1.1-beta
    • Fix Version/s: None
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:

      Description

      I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on.

      All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly:

      2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset
      

      It wouldn't answer to a jstack and jstack -F ended up being useless.

      Here's what I was able to find in the NameNode logs regarding this block ID:

      fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
      2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
      2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
      2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
      2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
      2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1)
      2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
      2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1)
      2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
      2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
      2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371]
      2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, blk_1073742181_1358, blk_1073742182_1361, blk_1073742185_1364, blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
      2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.83.107.80:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, blk_1073742181_1358, blk_1073742182_1361, blk_1073742183_1362, blk_1073742184_1363, blk_1073742185_1364, blk_1073742186_1366, blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
      

      This seems to suggest that the block was successfully deleted, but then DN got into a death spiral inside of a scanner.

      I can keep the cluster running for a few days if anybody is willing to take a look. Ask me for creds via a personal email.

      1. HDFS-5225.2.patch
        10 kB
        Tsuyoshi OZAWA
      2. HDFS-5225-reproduce.1.txt
        9 kB
        Tsuyoshi OZAWA
      3. HDFS-5225.1.patch
        1 kB
        Tsuyoshi OZAWA

        Issue Links

          Activity

          Roman Shaposhnik created issue -
          Roman Shaposhnik made changes -
          Field Original Value New Value
          Summary datanode keeps logging the same 'is no longer in the dataset' message over and over agin datanode keeps logging the same 'is no longer in the dataset' message over and over again
          Tsuyoshi OZAWA made changes -
          Attachment HDFS-5225.1.patch [ 12603976 ]
          Tsuyoshi OZAWA made changes -
          Attachment HDFS-5225-reproduce.1.txt [ 12604728 ]
          Arun C Murthy made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          Arun C Murthy made changes -
          Target Version/s 2.1.2-beta [ 12325049 ]
          Tsuyoshi OZAWA made changes -
          Attachment HDFS-5225.2.patch [ 12605069 ]
          Tsuyoshi OZAWA made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Tsuyoshi OZAWA [ ozawa ]
          Tsuyoshi OZAWA made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Kihwal Lee made changes -
          Link This issue is related to HDFS-5031 [ HDFS-5031 ]
          Kihwal Lee made changes -
          Link This issue is related to HDFS-4797 [ HDFS-4797 ]
          Kihwal Lee made changes -
          Link This issue is duplicated by HDFS-5031 [ HDFS-5031 ]
          Kihwal Lee made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]

            People

            • Assignee:
              Tsuyoshi OZAWA
              Reporter:
              Roman Shaposhnik
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development