Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552 Drill Metadata management "Drill Metastore"
  3. DRILL-7357

Expose Drill Metastore data through INFORMATION_SCHEMA

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.17.0
    • Component/s: None
    • Labels:

      Description

      Document:
      https://docs.google.com/document/d/10CkLdrlUJUNRrHKLeo8jTUJB8xAP1D0byTOvn8wNoF0/edit#heading=h.gzj2dj5a4yds
      Sections:
      5.19 INFORMATION_SCHEMA updates
      4.3.2 Using the statistics

      information_schema tables will contain data from Metastore only if metastore.enabled is set to true.

      This Jira will add additional columns to TABLES and COLUMNS tables and new PARTITIONS table.
      Note: new columns or table are applicable only for Metastore data, for data from different sources these columns will be set to null.
      Additional columns
      TABLES:
      TABLE_SOURCE - table data type: PARQUET, CSV, JSON
      LOCATION - table location: /tmp/nation
      NUM_ROWS - number of rows in a table if know, null if not known
      LAST_MODIFIED_TIME - table's last modification time

      COLUMNS:
      COLUMN_SIZE (already existed but was not included, applicable for all sources) - estimated column size, for example for boolean 1, for integer 11 (sign + 10 digits), etc.
      COLUMN_DEFAULT (already existed but never was filled in) - column default value
      COLUMN_FORMAT - usually applicable for date time columns: yyyy-MM-dd
      NUM_NULLS - number of nulls in column values
      MIN_VAL - column min value in String representation: aaa
      MAX_VAL - column max value in String representation: zzz
      NDV - number of distinct values in column, expressed in Double
      EST_NUM_NON_NULLS - estimated number of non null values, expressed in Double
      IS_NESTED - if column is nested. Nested columns are extracted from columns with struct type.

      PARTITIONS table columns:
      TABLE_CATALOG - table catalog (currently we have only one catalog): DRILL
      TABLE_SCHEMA - table schema: dfs.tmp
      TABLE_NAME - table name: nation
      METADATA_KEY - top level segment key, he same for all nested segments and partitions: part_int=3
      METADATA_TYPE - SEGMENT or PARTITION
      METADATA_IDENTIFIER - current metadata identifier: part_int=3/part_varchar=g
      PARTITION_COLUMN - partition column name: part_varchar
      PARTITION_VALUE - partition column value: g
      LOCATION - segment location, null for partitions: /tmp/nation/part_int=3
      LAST_MODIFIED_TIME - last modification time

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                arina Arina Ielchiieva
                Reporter:
                arina Arina Ielchiieva
                Reviewer:
                Vova Vysotskyi
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: