Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15882

HS2 generating high memory pressure with many partitions and concurrent queries

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • HiveServer2
    • None

    Description

      I've created a Hive table with 2000 partitions, each backed by two files, with one row in each file. When I execute some number of concurrent queries against this table, e.g. as follows

      for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:10000 -n admin -p admin -e "select count(i_f_1) from misha_table;" & done
      

      it results in a big memory spike. With 20 queries I caused an OOM in a HS2 server with -Xmx200m and with 50 queries - in the one with -Xmx500m.

      I am attaching the results of jxray (www.jxray.com) analysis of a heap dump that was generated in the 50queries/500m heap scenario. It suggests that there are several opportunities to reduce memory pressure with not very invasive changes to the code:

      1. 24.5% of memory is wasted by duplicate strings (see section 6). With String.intern() calls added in the ~10 relevant places in the code, this overhead can be highly reduced.

      2. Almost 20% of memory is wasted due to various suboptimally used collections (see section 8). There are many maps and lists that are either empty or have just 1 element. By modifying the code that creates and populates these collections, we may likely save 5-10% of memory.

      3. Almost 20% of memory is used by instances of java.util.Properties. It looks like these objects are highly duplicate, since for each Partition each concurrently running query creates its own copy of Partion, PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 partitions) Properties in memory. By interning/deduplicating these objects we may be able to save perhaps 15% of memory.

      So overall, I think there is a good chance to reduce HS2 memory consumption in this scenario by ~40%.

      Attachments

        1. HIVE-15882.01.patch
          33 kB
          Misha Dmitriev
        2. HIVE-15882.02.patch
          33 kB
          Misha Dmitriev
        3. HIVE-15882.03.patch
          33 kB
          Misha Dmitriev
        4. HIVE-15882.04.patch
          34 kB
          Misha Dmitriev
        5. hs2-crash-2000p-500m-50q.txt
          65 kB
          Misha Dmitriev

        Issue Links

          Activity

            People

              misha@cloudera.com Misha Dmitriev
              misha@cloudera.com Misha Dmitriev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: