Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.10.0, Impala 2.11.0
    • Fix Version/s: Impala 2.12.0
    • Component/s: Frontend
    • Labels:
      None

      Description

      We should introduce a minimum sample size in bytes for COMPUTE STATS TABLESAMPLE. Reasons:

      • For small tables sampling does not make sense. Accurate stats can be obtained cheaply without sampling.
      • Very small sample sizes mostly do not make sense - some minimum of data is required to get meaningful stats.

      I think a 1GB minimum might be a good choice and ideally this minimum sample size would be configurable.

      Many other DBMS have stats collection with sampling and in many cases a minimum sample size is required to get any meaningful stats.

        Attachments

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: