[HIVE-968] map join may lead to very large files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5.0
Component/s: Query Processor
Labels:
None

Hadoop Flags:

Reviewed

Description

If the table under consideration is a very large file, it may lead to very large files on the mappers.
The job may never complete, and the files will never be cleaned from the tmp directory.
It would be great if the table can be placed in the distributed cache, but minimally the following should be added:

If the table (source) being joined leads to a very big file, it should just throw an error.
New configuration parameters can be added to limit the number of rows or for the size of the table.
The mapper should not be tried 4 times, but it should fail immediately.

I cant think of any better way for the mapper to communicate with the client, but for it to write in some well known
hdfs file - the client can read the file periodically (while polling), and if sees an error can just kill the job, but with
appropriate error messages indicating to the client why the job died.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-968.patch
04/Dec/09 20:10
13 kB
Ning Zhang
HIVE-968_4.patch
10/Dec/09 00:19
45 kB
Ning Zhang
HIVE-968_3.patch
09/Dec/09 01:52
39 kB
Ning Zhang
HIVE-968_2.patch
05/Dec/09 00:50
22 kB
Ning Zhang

Activity

People

Assignee:: Ning Zhang

Reporter:: Namit Jain

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 03/Dec/09 02:46

Updated:: 17/Dec/11 00:06

Resolved:: 10/Dec/09 05:50