Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2649 improve incremental stats scalability
  3. IMPALA-7425

Add option to load incremental statistics from catalog

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 3.1.0
    • Catalog
    • None
    • ghx-label-6

    Description

      Incremental statistics currently store all required data in catalogd and all impalad coordinators. However, this data is only required when computing incremental statistics. In cases where incremental statistics is used on many partition columns (due to tables with many columns, many partitions or both), this data can dominate the overall memory footprint. This can lead to OOM's, increased network usage, and instability.

      Add an option to avoid propagating incremental stats to all coordinators and instead, pull it on demand from the catalog only when needed by the compute incremental statistics statement.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vukercegovac Vuk Ercegovac
            vukercegovac Vuk Ercegovac
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment