Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
1. Merge info from metadataStatistics + statisticsKinds into one holder: Map<String, StatisticsHolder>.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: SEGMENT.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; MetadataType type (enum); String key; String identifier; }
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
missing fields ------------------ storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify ---------------- private final Map<String, Object> tableStatistics; private final Map<String, StatisticsKind> statisticsKinds; private final Set<String> partitionKeys; -> Map<String, String>
org.apache.drill.metastore.PartitionMetadata
missing fields ------------------ storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List<String>) location (String) (for directory level metadata) - directory location fields to modify ---------------- private final Map<String, Object> tableStatistics; private final Map<String, StatisticsKind> statisticsKinds; private final Set<Path> location; -> locations
org.apache.drill.metastore.FileMetadata
missing fields ------------------ storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify ---------------- private final Map<String, Object> tableStatistics; private final Map<String, StatisticsKind> statisticsKinds; private final Path location; - should contain directory to which file belongs
org.apache.drill.metastore.RowGroupMetadata
missing fields ------------------ storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify ---------------- private final Map<String, Object> tableStatistics; private final Map<String, StatisticsKind> statisticsKinds; private final Path location; - should contain directory to which file belongs
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into sub-packages.
11. Rename FileTableMetadata -> BaseTableMetadata
12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata
13. Introduce segment-level metadata class:
class SegmentMetadata { TableInfo tableInfo; MetadataInfo metadataInfo; SchemaPath column; TupleMetadata schema; String location; Map<SchemaPath, ColumnStatistics> columnsStatistics; Map<String, StatisticsHolder> statistics; List<String> partitionValues; List<String> locations; long lastModifiedTime; }
Segment metadata
In the fix for this Jira, one of the changes is introducing segment level metadata.
For now, metadata hierarchy is the following:
- Table
- Segment
- Partition
- File
- Row group
Segment represents some a part of the table united using some specific qualities. For example for file system tables, segment may correspond to directories with its data. For hive tables, segment corresponds to hive partitions.
In opposite, partition metadata, will correspond to "drill partitions". It is groups of data which have the same values for specific columns within a file or row group.
So filtering will be produced for table level, then for segments, after that for partitions, for files and then for row groups.
Attachments
Issue Links
- links to