Details
-
Bug
-
Status: Patch Available
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The compression ratio of the Orc compressed file will be very high in some cases.
The test table has three Int columns, with twelve million records, but the compressed file size is only 4M. Hive will automatically converts the Join to Map join, but this will cause memory overflow. So I think it is better to have a parameter to limit to the total number of table records in the Map Join convertion, and if the total number of records is larger than that, it can not be converted to Map join.
hive.auto.convert.join.max.number = 2500000L
The default value for this parameter is 2500000, because so many records occupy about 700M memory in clint JVM, and 2500000 records for Map Join are also large tables.