Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1567

increase hive.mapjoin.maxsize to 10 million

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      i saw in a very wide table, hive can process 1million rows in less than one minute (select all columns).
      setting the hive.mapjoin.maxsize to 100k is kind of too restrictive. Let's increase this to 10 million.

      1. hive-1567.patch
        5 kB
        Ashutosh Chauhan

        Activity

        Hide
        nzhang Ning Zhang added a comment -

        The hive.mapjoin.maxsize is there not for speed, it is for limiting memory consumption. We saw OOM exceptions quite a lot before this parameter was introduced. Rather than increasing it blindly a better way may be to estimate how many rows can be fit into memory based on the row size and available memory and adjusting this parameter automatically.

        Show
        nzhang Ning Zhang added a comment - The hive.mapjoin.maxsize is there not for speed, it is for limiting memory consumption. We saw OOM exceptions quite a lot before this parameter was introduced. Rather than increasing it blindly a better way may be to estimate how many rows can be fit into memory based on the row size and available memory and adjusting this parameter automatically.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        I think with HIVE-1754 this configuration is no longer useful. Patch to remove it.

        Show
        ashutoshc Ashutosh Chauhan added a comment - I think with HIVE-1754 this configuration is no longer useful. Patch to remove it.
        Hide
        namit Namit Jain added a comment -

        +1

        Show
        namit Namit Jain added a comment - +1
        Hide
        namit Namit Jain added a comment -

        Committed. Thanks Ashutosh

        Show
        namit Namit Jain added a comment - Committed. Thanks Ashutosh
        Hide
        hudson Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1032 (See https://builds.apache.org/job/Hive-trunk-h0.21/1032/)
        HIVE-1567. Remove hive.mapjoin.maxsize - it was not being used
        (Asuhtosh Chauhan via namit)

        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188449
        Files :

        • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        • /hive/trunk/conf/hive-default.xml
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
        Show
        hudson Hudson added a comment - Integrated in Hive-trunk-h0.21 #1032 (See https://builds.apache.org/job/Hive-trunk-h0.21/1032/ ) HIVE-1567 . Remove hive.mapjoin.maxsize - it was not being used (Asuhtosh Chauhan via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188449 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java

          People

          • Assignee:
            ashutoshc Ashutosh Chauhan
            Reporter:
            he yongqiang He Yongqiang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development