Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5300 Implement TABLESAMPLE
  3. IMPALA-6024

Add minimum sample size for COMPUTE STATS TABLESAMPLE

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.10.0, Impala 2.11.0
    • Impala 2.12.0
    • Frontend
    • None

    Description

      We should introduce a minimum sample size in bytes for COMPUTE STATS TABLESAMPLE. Reasons:

      • For small tables sampling does not make sense. Accurate stats can be obtained cheaply without sampling.
      • Very small sample sizes mostly do not make sense - some minimum of data is required to get meaningful stats.

      I think a 1GB minimum might be a good choice and ideally this minimum sample size would be configurable.

      Many other DBMS have stats collection with sampling and in many cases a minimum sample size is required to get any meaningful stats.

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: