[HIVE-3171] Bucketed sort merge join doesn't work when multiple files exist for small alias - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.0
Fix Version/s: 0.10.0
Component/s: Query Processor
Labels:

Hadoop Flags:

Reviewed

Description

Executing a query with the MAPJOIN hint and the bucketed sort merge join optimizations enabled:

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;

works fine with partitioned tables if there is only one partition in the table. However, if you add a second partition, Hive attempts to do a regular map-side join which can fail because the tables are too large. Hive ought to be able to still do the bucketed sort merge join with partitions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-3171.1.patch.txt
24/Aug/12 01:23
82 kB
Navis Ryu
HIVE-3171.2.patch.txt
06/Sep/12 00:13
88 kB
Navis Ryu

Issue Links

blocks

HIVE-3290 BucketizedHiveInputFormat should support combining files having same bucket number

Open

is blocked by

HIVE-3218 Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly

Closed

HIVE-3210 Support Bucketed mapjoin on partitioned table which has two or more partitions

Closed

Activity

People

Assignee:: Navis Ryu

Reporter:: Joey Echeverria

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 21/Jun/12 18:09

Updated:: 10/Jan/13 19:53

Resolved:: 07/Sep/12 17:41