Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7127 Fetch-on-demand metadata for the impalad-side catalog
  3. IMPALA-7501

Slim down metastore Partition objects in LocalCatalog cache

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 3.3.0, Impala 3.4.0
    • Fix Version/s: Impala 4.0.0
    • Component/s: Catalog
    • Labels:
    • Epic Color:
      ghx-label-8

      Description

      I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit after running a production workload simulation for a couple hours. It had 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M objects are retained by FieldSchema, which, as far as I remember, are ignored on the partition level by the Impala planner. So, with a bit of slimming down of these objects, we could make a huge dent in effective cache capacity given a fixed budget. Reducing object count should also have the effect of improved GC performance (old gen GC is more closely tied to object count than size)

        Attachments

        1. impalad_dominator_tree.txt
          20 kB
          Quanlong Huang
        2. impalad_histogram.txt
          2 kB
          Quanlong Huang

          Issue Links

            Activity

              People

              • Assignee:
                stigahuang Quanlong Huang
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: