Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10538

Fix NPE in FileSinkOperator from hashcode mismatch

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 1.2.0
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      A Null Pointer Exception occurs when in FileSinkOperator when using bucketed tables and distribute by with multiFileSpray enabled. The following snippet query reproduces this issue:

      set hive.enforce.bucketing = true;
      set hive.exec.reducers.max = 20;
      
      create table bucket_a(key int, value_a string) clustered by (key) into 256 buckets;
      create table bucket_b(key int, value_b string) clustered by (key) into 256 buckets;
      create table bucket_ab(key int, value_a string, value_b string) clustered by (key) into 256 buckets;
      
      -- Insert data into bucket_a and bucket_b
      
      insert overwrite table bucket_ab
      select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key = b.key) distribute by key;
      

      The following stack trace is logged.

      2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer (ExecReducer.java:reduce(255)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}}
      	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
      	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
      	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.NullPointerException
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
      	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
      	... 8 more
      

        Attachments

        1. HIVE-10538.1.patch
          12 kB
          Prasanth Jayachandran
        2. HIVE-10538.1.patch
          12 kB
          Prasanth Jayachandran
        3. HIVE-10538.1.patch
          12 kB
          Peter Slawski
        4. HIVE-10538.2.patch
          21 kB
          Peter Slawski
        5. HIVE-10538.3.patch
          19 kB
          Prasanth Jayachandran

          Activity

            People

            • Assignee:
              petersla Peter Slawski
              Reporter:
              petersla Peter Slawski
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: