Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21466

Increase Default Size of SPLIT_MAXSIZE

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0, 4.0.0
    • Fix Version/s: None
    • Component/s: Configuration
    • Labels:
      None
    • Flags:
      Patch

      Description

       MAPREDMAXSPLITSIZE(FileInputFormat.SPLIT_MAXSIZE, 256000000L, "", true),
      

      https://github.com/apache/hive/blob/8d4300a02691777fc96f33861ed27e64fed72f2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L682

      This field specifies a maximum size for each MR (maybe other?) splits.

      This number should be a multiple of the HDFS Block size. The way that this maximum is implemented, is that each block is added to the split, and if the split grows to be larger than the maximum allowed, the split is submitted to the cluster and a new split is opened.

      So, imagine the following scenario:

      • HDFS block size of 16 bytes
      • Maximum size of 40 bytes

      This will produce a split with 3 blocks. (2x16) = 32; another block will be inserted, (3x16) = 48 bytes in the split. So, while many operators would assume a split of 2 blocks, the actual is 3 blocks. Setting the maximum split size to a multiple of the HDFS block size will make this behavior less confusing.

      The current setting is ~256MB and when this was introduced, the default HDFS block size was 64MB. That is a factor of 4x. However, now HDFS block sizes are 128MB by default, so I propose setting this to 4x128MB. The larger splits (fewer tasks) should give a nice performance boost for modern hardware.

        Attachments

        1. HIVE-21466.1.patch
          1 kB
          David Mollitor
        2. HIVE-21466.2.patch
          21 kB
          David Mollitor

          Activity

            People

            • Assignee:
              belugabehr David Mollitor
              Reporter:
              belugabehr David Mollitor
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: