Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5300

Implement TABLESAMPLE

    XMLWordPrintableJSON

Details

    Description

      Implement the TABLESAMPLE clause that can be used against base table references in queries as well as the COMPUTE STATS statement.

      Examples:

      SELECT * FROM T TABLESAMPLE SYSTEM(10)
      COMPUTE STATS T TABLESAMPLE SYSTEM(20)
      

      Syntax inspired by SQL Server:
      https://technet.microsoft.com/en-us/library/ms189108(v=sql.105).aspx

      <tableref> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)]
      

      Implementation details

      • The given percentage refers to the percent of bytes in the table.
      • The sampling will be coarse-grained (file level).
      • Impala will randomly select files until the desired percentage of bytes has been reached

      Accepted limitations

      • Computing stats on a coarse-grained sample necessarily means a loss of precision with no guarantee on statistical significance
      • There is no guarantee that a sample covers all partitions
      • NDVs may be very inaccurate for sorted files
      • NDVs may be very inaccurate for an unfortunate selection of files

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: