Details

      Description

      Implement the TABLESAMPLE clause that can be used against base table references in queries as well as the COMPUTE STATS statement.

      Examples:

      SELECT * FROM T TABLESAMPLE SYSTEM(10)
      COMPUTE STATS T TABLESAMPLE SYSTEM(20)
      

      Syntax inspired by SQL Server:
      https://technet.microsoft.com/en-us/library/ms189108(v=sql.105).aspx

      <tableref> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)]
      

      Implementation details

      • The given percentage refers to the percent of bytes in the table.
      • The sampling will be coarse-grained (file level).
      • Impala will randomly select files until the desired percentage of bytes has been reached

      Accepted limitations

      • Computing stats on a coarse-grained sample necessarily means a loss of precision with no guarantee on statistical significance
      • There is no guarantee that a sample covers all partitions
      • NDVs may be very inaccurate for sorted files
      • NDVs may be very inaccurate for an unfortunate selection of files

        Attachments

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: