Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552 Drill Metadata management "Drill Metastore"
  3. DRILL-7357

Expose Drill Metastore data through INFORMATION_SCHEMA

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.17.0
    • None

    Description

      Document:
      https://docs.google.com/document/d/10CkLdrlUJUNRrHKLeo8jTUJB8xAP1D0byTOvn8wNoF0/edit#heading=h.gzj2dj5a4yds
      Sections:
      5.19 INFORMATION_SCHEMA updates
      4.3.2 Using the statistics

      information_schema tables will contain data from Metastore only if metastore.enabled is set to true.

      This Jira will add additional columns to TABLES and COLUMNS tables and new PARTITIONS table.
      Note: new columns or table are applicable only for Metastore data, for data from different sources these columns will be set to null.
      Additional columns
      TABLES:
      TABLE_SOURCE - table data type: PARQUET, CSV, JSON
      LOCATION - table location: /tmp/nation
      NUM_ROWS - number of rows in a table if know, null if not known
      LAST_MODIFIED_TIME - table's last modification time

      COLUMNS:
      COLUMN_SIZE (already existed but was not included, applicable for all sources) - estimated column size, for example for boolean 1, for integer 11 (sign + 10 digits), etc.
      COLUMN_DEFAULT (already existed but never was filled in) - column default value
      COLUMN_FORMAT - usually applicable for date time columns: yyyy-MM-dd
      NUM_NULLS - number of nulls in column values
      MIN_VAL - column min value in String representation: aaa
      MAX_VAL - column max value in String representation: zzz
      NDV - number of distinct values in column, expressed in Double
      EST_NUM_NON_NULLS - estimated number of non null values, expressed in Double
      IS_NESTED - if column is nested. Nested columns are extracted from columns with struct type.

      PARTITIONS table columns:
      TABLE_CATALOG - table catalog (currently we have only one catalog): DRILL
      TABLE_SCHEMA - table schema: dfs.tmp
      TABLE_NAME - table name: nation
      METADATA_KEY - top level segment key, he same for all nested segments and partitions: part_int=3
      METADATA_TYPE - SEGMENT or PARTITION
      METADATA_IDENTIFIER - current metadata identifier: part_int=3/part_varchar=g
      PARTITION_COLUMN - partition column name: part_varchar
      PARTITION_VALUE - partition column value: g
      LOCATION - segment location, null for partitions: /tmp/nation/part_int=3
      LAST_MODIFIED_TIME - last modification time

      Attachments

        Issue Links

          Activity

            People

              arina Arina Ielchiieva
              arina Arina Ielchiieva
              Vova Vysotskyi Vova Vysotskyi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: