Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4011

Sort Merge Join runs locally

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.9.0, 0.10.0
    • None
    • Query Processor
    • Linux

    Description

      After required settings to get Sort Merge Join, it does not kick-in and falls back to MapJoin with a local first step (on two bucketed and partitioned tables).

      Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran it on Hive 0.10 with sample public data and regular storage Formats.

      More details:

      set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
      set hive.optimize.bucketmapjoin = true;
      set hive.optimize.bucketmapjoin.sortedmerge = true;

      select /*+ MAPJOIN(l) */
      l.stock_price_open lo,
      r.stock_price_open ro
      from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and l.stock_symbol = r.stock_symbol and l.dte=r.dte)
      where ...

      DDL:

      (both tables)
      PARTITIONED BY (year string)
      CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
      STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'

      also made sure we had:

      set hive.enforce.bucketing=true;
      set hive.enforce.sorting=true;

      Run logs and more info in attached file.

      Attachments

        1. SMJ-JIRA-4011.txt
          10 kB
          Amir Youssefi

        Activity

          People

            Unassigned Unassigned
            amirhyoussefi Amir Youssefi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: