Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10336

TestBalancer failing intermittently because of not reseting UserGroupInformation completely

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The unit test TestBalancer failed sometimes.

      I looked for the reason. I found two main reasons causing this.

      • 1st. The test TestBalancer#testBalancerWithKeytabs executed timeout.
        org.apache.hadoop.hdfs.server.balancer.TestBalancer
        testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  Time elapsed: 300.41 sec  <<< ERROR!
        java.lang.Exception: test timed out after 300000 milliseconds
        	at java.lang.Thread.sleep(Native Method)
        	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
        	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
        	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
        	at org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
        	at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
        	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
        	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
        
      • 2nd. The test TestBalancer#testBalancerWithKeytabs reset the UGI not completely sometimes in the finally block. And this affected the other unit tests threw IOException, like this:
        testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  Time elapsed: 0 sec  <<< ERROR!
        java.io.IOException: Running in secure mode, but config doesn't have a keytab
        	at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
        

        And there were not only one test will be affected by this. We should add a line to do before doing reset UGI operation and can avoid the potenial exception happens.

        UserGroupInformation.reset();
        

        Attachments

        1. HDFS-10336.001.patch
          2 kB
          Yiqun Lin
        2. HDFS-10336.002.patch
          1 kB
          Yiqun Lin
        3. HDFS-10336.003.patch
          1 kB
          Yiqun Lin
        4. HDFS-10336.003-simplefix.patch
          0.7 kB
          Yiqun Lin

          Issue Links

            Activity

              People

              • Assignee:
                linyiqun Yiqun Lin
                Reporter:
                linyiqun Yiqun Lin
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: