Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16489

HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None

      Description

      I've just analyzed an HMS heap dump. It turns out that it contains a lot of duplicate strings, that waste 26.4% of the heap. Most of them come from HashMaps referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. Below is the relevant section of the jxray (www.jxray.com) report. Looking at Partition.java, I see that in the past somebody has already added code to intern keys and values in the parameters table when it's first set up. However, looks like when more key-value pairs are added, they are not interned, and that probably explains the reason for all these duplicate strings.

      6. DUPLICATE STRINGS
      
      Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  Overhead: 3,220,458K (26.4%)
      
      Top duplicate strings:
          Ovhd         Num char[]s   Num objs   Value
      
       46,088K (0.4%)     5871        5871      "HBa4rRAAGx2MEmludGVyZXN0cmF0ZXNwcmVhZBgM/wD/AP8AXAAAAqEAERYBFQAXAAAAAAAAIEAWuK0QAA1s ...[length 4000]"
       46,088K (0.4%)     5871        5871      "BQcHBQUGBQgGBQcHCAUGCAkECQcFBQwGBgoJBQYHBQUFBQYKBQgIBgUJEgYFDAYJBgcGBAcLBQYGCAgGCQYG ...[length 4000]"
      ...
      
      ===================================================
      
      7. REFERENCE CHAINS FOR DUPLICATE STRINGS
      
        2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing arrays:
      39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 3560]"
      ... and 419200 more strings, of which 36376 are unique
      Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 28 of "2", 21 of "0"
           <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters <--  {j.u.ArrayList} <-- org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success <-- Java Local (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
        463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing arrays:
      7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
      ... and 84009 more strings, of which 34065 are unique
      Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 of "2", 3 of "0"
           <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters <--  {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
        233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
      4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 of "10", 623 of "CQUJBQcFCAcGBwUFCgUIDAgEBwgFBQcHBwgGBwYEBQoLCggFCAYHBgcIBwkIDgcG ...[length 4000]", 623 of "BQcHBQUGBQgGBQcHCAUGCAkECQcFBQwGBgoJBQYHBQUFBQYKBQgIBgUJEgYFDAYJ ...[length 4000]", 623 of "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 3560]", 623 of "AAMAAAEAAAAAAAEAAAAAAQABAAEHAwAKAgAEAwAAAAAAAgAEAAAAAAMAAAADAAAA ...[length 4000]"
      ... and 44568 more strings, of which 27285 are unique
      Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
           <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters <--  {j.u.ArrayList} <-- Java Local (j.u.ArrayList) [@4f4cfbd10,@536122408,@726616778]
      ...
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                misha@cloudera.com Misha Dmitriev
                Reporter:
                misha@cloudera.com Misha Dmitriev
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: