Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18362

Introduce a parameter to control the max row number for map join convertion

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      The compression ratio of the Orc compressed file will be very high in some cases.

      The test table has three Int columns, with twelve million records, but the compressed file size is only 4M. Hive will automatically converts the Join to Map join, but this will cause memory overflow. So I think it is better to have a parameter to limit to the total number of table records in the Map Join convertion, and if the total number of records is larger than that, it can not be converted to Map join.

      hive.auto.convert.join.max.number = 2500000L

      The default value for this parameter is 2500000, because so many records occupy about 700M memory in clint JVM, and 2500000 records for Map Join are also large tables.

        Attachments

          Activity

            People

            • Assignee:
              gopalv Gopal Vijayaraghavan
              Reporter:
              wankunde wan kun
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Due:
                Created:
                Updated: