Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3863

Possible logic problem when calculating an if statement of a nullable column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 0.6.2
    • None
    • None
    • None
    • Hive on Tez (Hive's original CLI, not beeline):
      Hadoop 2.6.2
      Hive 1.2.1
      Tez 0.6.2

    Description

      To test a count(if(...)) logic, a Hive(MR) and a Hive(Tez) CLIs are started to have a snippet of HiveQL executed:
      select o1.k, o2.k, (if(o2.k>0, o2.k, null))
      from
      (
      select 1 as k union all select 0 union all select null union all select 2
      ) o1
      left outer join
      (
      select 1 as k union all select 0 union all select null
      ) o2
      on (o1.k = o2.k) ;

      Hive(MR) returns:
      1 1 1
      0 0 NULL
      NULL NULL NULL
      2 NULL NULL

      Hive(Tez) returns:
      1 1 1
      0 0 NULL
      NULL NULL NULL
      2 NULL 2

      The last line of Hive(Tez)'s result is quite strange, as the o2.k can never have a value of 2.
      May it be a bug or something this 0.6.2 version of Tez hasn't got patched?

      w/ Tez init conf(hiveconf)
      set mapreduce.map.memory.mb=4096;
      set mapreduce.map.java.opts=-Xmx3280m;

      set mapreduce.reduce.memory.mb=4096;
      set mapreduce.reduce.java.opts=-Xmx3280m;

      set mapreduce.framework.name=yarn-tez;
      set hive.execution.engine=tez;

      set tez.am.resource.memory.mb=4096;
      set tez.task.resource.memory.mb=4096;
      set hive.tez.container.size=4096;

      set hive.auto.convert.sortmerge.join=false;
      set hive.auto.convert.sortmerge.join.to.mapjoin=false;
      set hive.convert.join.bucket.mapjoin.tez=false;

      set hive.optimize.bucketmapjoin = false;
      set hive.optimize.bucketmapjoin.sortedmerge = false;

      set hive.enforce.sortmergebucketmapjoin=false;
      set hive.exec.submit.local.task.via.child=true;
      set hive.mapjoin.bucket.cache.size=10000;
      set hive.mapjoin.optimized.hashtable=true;

      set hive.tez.dynamic.partition.pruning=false;

      set hive.tez.java.opts=-Xmx3280m;
      set tez.am.java.opts=-Xmx3280m;
      set tez.runtime.io.sort.mb=1638;
      set tez.runtime.unordered.output.buffer.size-mb=409;

      set hive.auto.convert.join.noconditionaltask.size=1365;
      set tez.runtime.unordered.output.buffer.size-mb=409;

      set tez.grouping.min-size=16777216;
      set tez.grouping.max-size=1073741824;

      set hive.exec.reducers.max=1099;
      set hive.exec.reducers.bytes.per.reducer=258998272;

      set mapred.max.split.size=100000000;

      set hive.support.sql11.reserved.keywords=false;
      set hive.cbo.enable=true;
      set hive.compute.query.using.stats=true;
      set hive.stats.fetch.column.stats=true;
      set hive.stats.fetch.partition.stats=true;
      set hive.vectorized.execution.enabled=true;
      set hive.vectorized.execution.reduce.enabled = true;
      set hive.vectorized.execution.reduce.groupby.enabled = true;
      set hive.exec.parallel=true;
      set hive.exec.parallel.thread.number=16;
      set hive.tez.auto.reducer.parallelism=true;
      set hive.tez.min.partition.factor=0.25;
      set hive.tez.max.partition.factor=2.0;
      set mapred.reduce.tasks=-1;
      set tez.shuffle-vertex-manager.min-src-fraction=0.25;
      set tez.shuffle-vertex-manager.max-src-fraction=0.75;

      set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
      set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

      Attachments

        Activity

          People

            Unassigned Unassigned
            yao_lu_eric_whysomanydups Lu, Yao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: