Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13665

HS2 memory leak When multiple queries are running with get_json_object

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.1.0, 2.2.0
    • 2.3.0
    • UDF
    • None

    Description

      The extractObjectCache in UDFJson is increased over limitation(CACHE_SIZE = 16). When multiple queries are running concurrently on HS2 local(not mr/tez) with get_json_object or get_json_tuple

      HS2 heap_dump
      Object at 0x515ab18f8
      instance of org.apache.hadoop.hive.ql.udf.UDFJson$HashCache@0x515ab18f8 (77 bytes)
      Class:
      class org.apache.hadoop.hive.ql.udf.UDFJson$HashCache
      Instance data members:
      accessOrder (Z) : false
      entrySet (L) : <null>
      hashSeed (I) : 0
      header (L) : java.util.LinkedHashMap$Entry@0x515a577d0 (60 bytes) 
      keySet (L) : <null>
      loadFactor (F) : 0.6
      modCount (I) : 4741146
      size (I) : 2733158                   <========== here!!
      table (L) : [Ljava.util.HashMap$Entry;@0x7163d8b70 (67108880 bytes) 
      threshold (I) : 5033165
      values (L) : <null>
      References to this object:
      

      I think that this problem be caused by the LinkedHashMap object is not thread-safe

      * <p><strong>Note that this implementation is not synchronized.</strong>
       * If multiple threads access a linked hash map concurrently, and at least
       * one of the threads modifies the map structurally, it <em>must</em> be
       * synchronized externally.  This is typically accomplished by
       * synchronizing on some object that naturally encapsulates the map.
      

      Reproduce :

      1. Multiple queries are running with get_json_object and small input data(for execution on hs2 local mode)
      2. jvm heap dump & analyze
        test scenario
        Multiple queries are running with get_json_object and small input data(for execute on hs2 local mode)
        1.hql :
        SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040105' 
        2.hql :
        SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040106'
        3.hql :
        SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040107'
        4.hql :
        SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040108'
         
        run.sh :
        t_cnt=0
        while true
        do
        echo "query executing..."
            for i in 1 2 3 4
            do
                beeline -u jdbc:hive2://localhost:10000 -n hive --silent=true -f $i.hql > $i.log 2>&1 &
            done
        wait
        t_cnt=`expr $t_cnt + 1`
        echo "query count : $t_cnt"
        sleep 2
        done
        
        jvm heap dump & analyze :
        jmap -dump:format=b,file=hive.dmp $PID
        jhat -J-mx48000m -port 8080 hive.dmp &
        

      Finally I have attached our patch.

      Attachments

        1. patch.lst.txt
          6 kB
          JinsuKim

        Issue Links

          Activity

            People

              jinzheng jinzheng
              goodjins JinsuKim
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: