Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Document:
https://docs.google.com/document/d/10CkLdrlUJUNRrHKLeo8jTUJB8xAP1D0byTOvn8wNoF0/edit#heading=h.gzj2dj5a4yds
Sections:
5.19 INFORMATION_SCHEMA updates
4.3.2 Using the statistics
information_schema tables will contain data from Metastore only if metastore.enabled is set to true.
This Jira will add additional columns to TABLES and COLUMNS tables and new PARTITIONS table.
Note: new columns or table are applicable only for Metastore data, for data from different sources these columns will be set to null.
Additional columns
TABLES:
TABLE_SOURCE - table data type: PARQUET, CSV, JSON
LOCATION - table location: /tmp/nation
NUM_ROWS - number of rows in a table if know, null if not known
LAST_MODIFIED_TIME - table's last modification time
COLUMNS:
COLUMN_SIZE (already existed but was not included, applicable for all sources) - estimated column size, for example for boolean 1, for integer 11 (sign + 10 digits), etc.
COLUMN_DEFAULT (already existed but never was filled in) - column default value
COLUMN_FORMAT - usually applicable for date time columns: yyyy-MM-dd
NUM_NULLS - number of nulls in column values
MIN_VAL - column min value in String representation: aaa
MAX_VAL - column max value in String representation: zzz
NDV - number of distinct values in column, expressed in Double
EST_NUM_NON_NULLS - estimated number of non null values, expressed in Double
IS_NESTED - if column is nested. Nested columns are extracted from columns with struct type.
PARTITIONS table columns:
TABLE_CATALOG - table catalog (currently we have only one catalog): DRILL
TABLE_SCHEMA - table schema: dfs.tmp
TABLE_NAME - table name: nation
METADATA_KEY - top level segment key, he same for all nested segments and partitions: part_int=3
METADATA_TYPE - SEGMENT or PARTITION
METADATA_IDENTIFIER - current metadata identifier: part_int=3/part_varchar=g
PARTITION_COLUMN - partition column name: part_varchar
PARTITION_VALUE - partition column value: g
LOCATION - segment location, null for partitions: /tmp/nation/part_int=3
LAST_MODIFIED_TIME - last modification time
Attachments
Issue Links
- links to