Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
2.8.0, 3.0.0-alpha2
-
None
-
None
-
Reviewed
-
In case a directory with subdirectories were removed from a directory that has a snapshot containing the removed subdirectory, hdfs dfs -du on any ancestor of the removed directories ran into a ConcurrentModificationException, and failed.
Description
HDFS-10797 fixed a disk summary (-du) bug, but it introduced a new bug.
The bug can be reproduced running the following commands:
bash-4.1$ hdfs dfs -mkdir /tmp/d0 bash-4.1$ hdfs dfsadmin -allowSnapshot /tmp/d0 Allowing snaphot on /tmp/d0 succeeded bash-4.1$ hdfs dfs -touchz /tmp/d0/f4 bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1 bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s1 Created snapshot /tmp/d0/.snapshot/s1 bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2 bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3 bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2/d4 bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3/d5 bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s2 Created snapshot /tmp/d0/.snapshot/s2 bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2/d4 bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2 bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3/d5 bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3 bash-4.1$ hdfs dfs -du -h /tmp/d0 du: java.util.ConcurrentModificationException 0 0 /tmp/d0/f4
A ConcurrentModificationException forced du to terminate abruptly.
Correspondingly, NameNode log has the following error:
2017-03-08 14:32:17,673 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getContentSumma ry from 10.0.0.198:49957 Call#2 Retry#0 java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) at java.util.HashMap$KeyIterator.next(HashMap.java:956) at org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.tallyDeletedSnapshottedINodes(ContentSummaryComputationContext.java:209) at org.apache.hadoop.hdfs.server.namenode.INode.computeAndConvertContentSummary(INode.java:507) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2302) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4535) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1087) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:5 63) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.jav a:873) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
The bug is due to a improper use of HashSet, not concurrent operations. Basically, a HashSet can not be updated while an iterator is traversing it.
Attachments
Attachments
Issue Links
- breaks
-
HDFS-10797 Disk usage summary of snapshots causes renamed blocks to get counted twice
- Resolved
- is depended upon by
-
HDFS-11787 After HDFS-11515, -du still throws ConcurrentModificationException
- Resolved
- is related to
-
HDFS-11661 GetContentSummary uses excessive amounts of memory
- Resolved