Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3171

Bucketed sort merge join doesn't work when multiple files exist for small alias

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Executing a query with the MAPJOIN hint and the bucketed sort merge join optimizations enabled:

      set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
      set hive.optimize.bucketmapjoin = true;
      set hive.optimize.bucketmapjoin.sortedmerge = true;
      

      works fine with partitioned tables if there is only one partition in the table. However, if you add a second partition, Hive attempts to do a regular map-side join which can fail because the tables are too large. Hive ought to be able to still do the bucketed sort merge join with partitions.

      Attachments

        1. HIVE-3171.1.patch.txt
          82 kB
          Navis Ryu
        2. HIVE-3171.2.patch.txt
          88 kB
          Navis Ryu

        Issue Links

          Activity

            People

              navis Navis Ryu
              fwiffo Joey Echeverria
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: