Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11515

-du throws ConcurrentModificationException

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.8.0, 3.0.0-alpha2
    • None
    • namenode, shell
    • None
    • Reviewed
    • In case a directory with subdirectories were removed from a directory that has a snapshot containing the removed subdirectory, hdfs dfs -du on any ancestor of the removed directories ran into a ConcurrentModificationException, and failed.

    Description

      HDFS-10797 fixed a disk summary (-du) bug, but it introduced a new bug.

      The bug can be reproduced running the following commands:

      bash-4.1$ hdfs dfs -mkdir /tmp/d0
      bash-4.1$ hdfs dfsadmin -allowSnapshot /tmp/d0
      Allowing snaphot on /tmp/d0 succeeded
      bash-4.1$ hdfs dfs -touchz /tmp/d0/f4
      bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1
      bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s1
      Created snapshot /tmp/d0/.snapshot/s1
      bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2
      bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3
      bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2/d4
      bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3/d5
      bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s2
      Created snapshot /tmp/d0/.snapshot/s2
      bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2/d4
      bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2
      bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3/d5
      bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3
      bash-4.1$ hdfs dfs -du -h /tmp/d0
      du: java.util.ConcurrentModificationException
      0 0 /tmp/d0/f4
      

      A ConcurrentModificationException forced du to terminate abruptly.

      Correspondingly, NameNode log has the following error:

      2017-03-08 14:32:17,673 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getContentSumma
      ry from 10.0.0.198:49957 Call#2 Retry#0
      java.util.ConcurrentModificationException
              at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
              at java.util.HashMap$KeyIterator.next(HashMap.java:956)
              at org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.tallyDeletedSnapshottedINodes(ContentSummaryComputationContext.java:209)
              at org.apache.hadoop.hdfs.server.namenode.INode.computeAndConvertContentSummary(INode.java:507)
              at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2302)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4535)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1087)
              at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:5
      63)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.jav
      a:873)
              at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
      

      The bug is due to a improper use of HashSet, not concurrent operations. Basically, a HashSet can not be updated while an iterator is traversing it.

      Attachments

        1. HDFS-11515.004.patch
          7 kB
          István Fajth
        2. HDFS-11515.003.patch
          6 kB
          István Fajth
        3. HDFS-11515.002.patch
          6 kB
          István Fajth
        4. HDFS-11515.001.patch
          6 kB
          István Fajth
        5. HDFS-11515.test.patch
          2 kB
          Wei-Chiu Chuang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pifta István Fajth
            weichiu Wei-Chiu Chuang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment