Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10336

TestBalancer failing intermittently because of not reseting UserGroupInformation completely

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha1
    • 2.8.0, 2.7.4, 3.0.0-alpha1
    • test
    • None
    • Reviewed

    Description

      The unit test TestBalancer failed sometimes.

      I looked for the reason. I found two main reasons causing this.

      • 1st. The test TestBalancer#testBalancerWithKeytabs executed timeout.
        org.apache.hadoop.hdfs.server.balancer.TestBalancer
        testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  Time elapsed: 300.41 sec  <<< ERROR!
        java.lang.Exception: test timed out after 300000 milliseconds
        	at java.lang.Thread.sleep(Native Method)
        	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
        	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
        	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
        	at org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
        	at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
        	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
        	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
        
      • 2nd. The test TestBalancer#testBalancerWithKeytabs reset the UGI not completely sometimes in the finally block. And this affected the other unit tests threw IOException, like this:
        testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  Time elapsed: 0 sec  <<< ERROR!
        java.io.IOException: Running in secure mode, but config doesn't have a keytab
        	at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
        

        And there were not only one test will be affected by this. We should add a line to do before doing reset UGI operation and can avoid the potenial exception happens.

        UserGroupInformation.reset();
        

      Attachments

        1. HDFS-10336.001.patch
          2 kB
          Yiqun Lin
        2. HDFS-10336.002.patch
          1 kB
          Yiqun Lin
        3. HDFS-10336.003.patch
          1 kB
          Yiqun Lin
        4. HDFS-10336.003-simplefix.patch
          0.7 kB
          Yiqun Lin

        Issue Links

          Activity

            People

              linyiqun Yiqun Lin
              linyiqun Yiqun Lin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: