Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1122

COMPUTE STATS for only new partitions/columns

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 1.4
    • Fix Version/s: Impala 2.1
    • Component/s: None
    • Labels:
      None

      Description

      COMPUTE STATS is absolutely necessary for my JOIN queries to even finish. Its runs are extremely long, which is completely understandable given that my table have 1000s of partitions and 100s of columns.

      Now the frustrating part is that when running COMPUTE STATS twice, the second run is not really faster than the first.

      For my use-case, where I only add a few partitions every day, and very rarely change the schema, it would be absolutely lovely to be able to optionally specify an INCREMENTAL option to COMPUTE STATS. The behavior should then be "Skip computing stats for partitions/columns which already have computed stats. In short, I would like a way to tell Impala "Trust me, the stats you have computed in the past are still valid, I only added some data".

        Attachments

          Activity

            People

            • Assignee:
              henryr Henry Robinson
              Reporter:
              julienlehuen Julien Lehuen
            • Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: