Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14303

CommonJoinOperator.checkAndGenObject should return directly to avoid NPE if ExecReducer.close is called twice.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None

      Description

      CommonJoinOperator.checkAndGenObject should return directly (after CommonJoinOperator.closeOp was called ) to avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements Closeable interface and ExecReducer.close can be called multiple time. We saw the following NPE which hide the real exception due to this bug.

      Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: null
      
              at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
      
              at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
      
              at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
      
              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
      
              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
      
              at java.security.AccessController.doPrivileged(Native Method)
      
              at javax.security.auth.Subject.doAs(Subject.java:415)
      
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
      
              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      
      Caused by: java.lang.NullPointerException
      
              at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
      
              at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
      
              at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
      
              ... 8 more
      

      The code from ReduceTask.runOldReducer:

            reducer.close(); //line 453
            reducer = null;
            
            out.close(reporter);
            out = null;
          } finally {
            IOUtils.cleanup(LOG, reducer);// line 459
            closeQuietly(out, reporter);
          }
      

      Based on the above stack trace and code, reducer.close() is called twice because the exception happened when reducer.close() is called for the first time at line 453, the code exit before reducer was set to null. NullPointerException is triggered when reducer.close() is called for the second time in IOUtils.cleanup at line 459. NullPointerException hide the real exception which happened when reducer.close() is called for the first time at line 453.
      The reason for NPE is:
      The first reducer.close called CommonJoinOperator.closeOp which clear storage

      Arrays.fill(storage, null);
      

      the second reduce.close generated NPE due to null storage[alias] which is set to null by first reducer.close.
      The following reducer log can give more proof:

      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
      2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - 53466
      2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing operators - failing tree
      2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators: null
      	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
      	at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
      	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
      	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: java.lang.NullPointerException
      	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
      	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
      	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
      	... 8 more
      

       

        Attachments

        1. HIVE-14303.2.patch
          2 kB
          zhihai xu
        2. HIVE-14303.1.patch
          2 kB
          zhihai xu
        3. HIVE-14303.0.patch
          0.7 kB
          zhihai xu

          Activity

            People

            • Assignee:
              zxu zhihai xu
              Reporter:
              zxu zhihai xu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: