Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.7.1
-
None
-
None
-
Reviewed
Description
The cause is certainly that we use an array list instead of a set structure in the split locations API. Looks like a bug in Hive's CombineFileInputFormat.
Reproduce:
Set mapreduce.jobtracker.split.metainfo.maxsize=100000000 when submitting the Hive query. Run a big hive query that write data into a partitioned table. Due to the large number of splits, you encounter an exception on the job submitted to Hadoop and the exception said:
meta data size exceeds 100000000.
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-2021 CombineFileInputFormat returns duplicate hostnames in split locations
-
- Closed
-