Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6668

When auto join convert is on and noconditionaltask is off, ConditionalResolverCommonJoin fails to resolve map joins.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.13.0, 0.14.0
    • 0.13.0
    • None
    • None

    Description

      I tried the following query today ...

      set mapred.job.map.memory.mb=2048;
      set mapred.job.reduce.memory.mb=2048;
      set mapred.map.child.java.opts=-server -Xmx3072m -Djava.net.preferIPv4Stack=true;
      set mapred.reduce.child.java.opts=-server -Xmx3072m -Djava.net.preferIPv4Stack=true;
      
      set mapred.reduce.tasks=60;
      
      set hive.stats.autogather=false;
      set hive.exec.parallel=false;
      set hive.enforce.bucketing=true;
      set hive.enforce.sorting=true;
      set hive.map.aggr=true;
      set hive.optimize.bucketmapjoin=true;
      set hive.optimize.bucketmapjoin.sortedmerge=true;
      set hive.mapred.reduce.tasks.speculative.execution=false;
      set hive.auto.convert.join=true;
      set hive.auto.convert.sortmerge.join=true;
      set hive.auto.convert.sortmerge.join.noconditionaltask=false;
      set hive.auto.convert.join.noconditionaltask=false;
      set hive.auto.convert.join.noconditionaltask.size=100000000;
      set hive.optimize.reducededuplication=true;
      set hive.optimize.reducededuplication.min.reducer=1;
      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      set hive.mapjoin.smalltable.filesize=45000000;
      
      set hive.optimize.index.filter=false;
      set hive.vectorized.execution.enabled=false;
      set hive.optimize.correlation=false;
      select
         i_item_id,
         s_state,
         avg(ss_quantity) agg1,
         avg(ss_list_price) agg2,
         avg(ss_coupon_amt) agg3,
         avg(ss_sales_price) agg4
      FROM store_sales
      JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
      JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
      JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk)
      JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
      where
         cd_gender = 'F' and
         cd_marital_status = 'U' and
         cd_education_status = 'Primary' and
         d_year = 2002 and
         s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
      group by i_item_id, s_state with rollup
      order by
         i_item_id,
         s_state
      limit 100;
      

      The log shows ...

      14/03/14 17:05:02 INFO plan.ConditionalResolverCommonJoin: Failed to resolve driver alias (threshold : 45000000, length mapping : {store=94175, store_sales=48713909726, item=39798667, customer_demographics=1660831, date_dim=2275902})
      Stage-27 is filtered out by condition resolver.
      14/03/14 17:05:02 INFO exec.Task: Stage-27 is filtered out by condition resolver.
      Stage-28 is filtered out by condition resolver.
      14/03/14 17:05:02 INFO exec.Task: Stage-28 is filtered out by condition resolver.
      Stage-3 is selected by condition resolver.
      

      Stage-3 is a reduce join. Actually, the resolver should pick the map join

      Attachments

        1. HIVE-6668.3.patch.txt
          36 kB
          Navis Ryu
        2. HIVE-6668.2.patch.txt
          27 kB
          Navis Ryu
        3. HIVE-6668.1.patch.txt
          1 kB
          Navis Ryu

        Activity

          People

            navis Navis Ryu
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: