[HDFS-5225] datanode keeps logging the same 'is no longer in the dataset' message over and over again - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Duplicate
Affects Version/s: 2.1.1-beta
Fix Version/s: None
Component/s: datanode
Labels:
None

Target Version/s:

2.2.0

Description

I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following configuration: 4 nodes fully distributed cluster with security on.

All of a sudden my DN ate up all of the space in /var/log logging the following message repeatedly:

2013-09-18 20:51:12,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in the dataset

It wouldn't answer to a jstack and jstack -F ended up being useless.

Here's what I was able to find in the NameNode logs regarding this block ID:

fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_. BP-1884637155-10.224.158.152-1379524544853 blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.224.158.152:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.34.74.206:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.83.107.80:1004 is added to blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], ReplicaUnderConstruction[10.34.74.206:1004|RBW], ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
2013-09-18 18:03:17,899 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369, newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1)
2013-09-18 18:03:17,904 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
2013-09-18 18:03:18,540 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370, newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 10.34.74.206:1004, 10.224.158.152:1004], clientName=DFSClient_NONMAPREDUCE_-450304083_1)
2013-09-18 18:03:18,548 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, blk_1073742188_1368, blk_1073742189_1371]
2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, blk_1073742181_1358, blk_1073742182_1361, blk_1073742185_1364, blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 10.83.107.80:1004 to delete [blk_1073742177_1353, blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, blk_1073742181_1358, blk_1073742182_1361, blk_1073742183_1362, blk_1073742184_1363, blk_1073742185_1364, blk_1073742186_1366, blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]

This seems to suggest that the block was successfully deleted, but then DN got into a death spiral inside of a scanner.

I can keep the cluster running for a few days if anybody is willing to take a look. Ask me for creds via a personal email.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-5225.1.patch
19/Sep/13 02:50
1 kB
Tsuyoshi Ozawa
HDFS-5225.2.patch
25/Sep/13 18:39
10 kB
Tsuyoshi Ozawa
HDFS-5225-reproduce.1.txt
24/Sep/13 02:00
9 kB
Tsuyoshi Ozawa

Issue Links

is duplicated by

HDFS-5031 BlockScanner scans the block multiple times and on restart scans everything

Closed

is related to

HDFS-5031 BlockScanner scans the block multiple times and on restart scans everything

Closed

HDFS-4797 BlockScanInfo does not override equals(..) and hashCode() consistently

Closed

datanode keeps logging the same 'is no longer in the dataset' message over and over again

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates