Hive
  1. Hive
  2. HIVE-2082

Reduce memory consumption in preparing MapReduce job

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Hive client side consume a lot of memory when the number of input partitions is large. One reason is that each partition maintains a list of FieldSchema which are intended to deal with schema evolution. However they are not used currently and Hive uses the table level schema for all partitions. This will be fixed in HIVE-2050. The memory consumption by this part will be reduced by almost half (1.2GB to 700BM for 20k partitions).

      Another large chunk of memory consumption is in the MapReduce job setup phase when a PartitionDesc is created from each Partition object. A property object is maintained in PartitionDesc which contains a full list of columns and types. Due to the same reason, these should be the same as in the table level schema. Also the deserializer initialization takes large amount of memory, which should be avoided. My initial testing for these optimizations cut the memory consumption in half (700MB to 300MB for 20k partitions).

      1. HIVE-2082.patch
        286 kB
        Ning Zhang
      2. HIVE-2082.patch
        286 kB
        Ning Zhang
      3. HIVE-2082.patch
        286 kB
        Ning Zhang

        Activity

        Ning Zhang created issue -
        Ning Zhang made changes -
        Field Original Value New Value
        Attachment HIVE-2082.patch [ 12475593 ]
        Ning Zhang made changes -
        Attachment HIVE-2082.patch [ 12475594 ]
        Ning Zhang made changes -
        Attachment HIVE-2082.patch [ 12475595 ]
        Ning Zhang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Namit Jain made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Namit Jain made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Carl Steinbach made changes -
        Fix Version/s 0.8.0 [ 12316178 ]
        Component/s Query Processor [ 12312586 ]
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Ning Zhang
            Reporter:
            Ning Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development