Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3173

Reduce catalog's memory footprint

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      An initial analysis of catalog's heap dumps shows that we can probably reduce it's memory footprint by: a) avoid storing redundant information about catalog entities such as partitions, b) using more compressed data structures.

      Currently, for a table with 2 int columns and 1 int partition column and without incremental stats, we use:

      • ~930B per partition out of which ~500B are used on hmsParameters_ (<String, String>Map), ~190B on cachedMsPartitionDescriptor_, and ~200B (depending on path) on location.
      • ~800B per file descriptor out of which ~530B go to file_blocks and the rest are used for storing the file_name.
      • Every HdfsTable also uses two maps that replicate partition locations and file names (e.g. perPartitionFileDescMap_ and nameToPartitionMap_).

      A table like that with 100,000 partitions and 10 files per partition requires 1GB and 1.4GB of memory w and w/o incremental stats, respectively.

      This is a parent JIRA of IMPALA-2840.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            dtsirogiannis Dimitris Tsirogiannis

            Dates

              Created:
              Updated:

              Slack

                Issue deployment