Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552 Drill Metadata management "Drill Metastore"
  3. DRILL-7271

Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.17.0
    • None

    Description

      1. Merge info from metadataStatistics + statisticsKinds into one holder: Map<String, StatisticsHolder>.
      2. Rename hasStatistics to hasDescriptiveStatistics
      3. Remove drill-file-metastore-plugin
      4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: SEGMENT.
      5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
      6. Add new info classes:

      class TableInfo {
        String storagePlugin;
        String workspace;
        String name;
        String type;
        String owner;
      }
      
      class MetadataInfo {
      
        public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
        public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
      
        MetadataType type (enum);
        String key;
        String identifier;
      }
      

      7. Modify existing metadata classes:
      org.apache.drill.metastore.FileTableMetadata

      missing fields
      ------------------
      storagePlugin, workspace, tableType -> will be covered by TableInfo class
      metadataType, metadataKey -> will be covered by MetadataInfo class
      interestingColumns
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Set<String> partitionKeys; -> Map<String, String>
      

      org.apache.drill.metastore.PartitionMetadata

      missing fields
      ------------------
      storagePlugin, workspace -> will be covered by TableInfo class
      metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
      partitionValues (List<String>)
      location (String) (for directory level metadata) - directory location
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Set<Path> location; -> locations
      

      org.apache.drill.metastore.FileMetadata

      missing fields
      ------------------
      storagePlugin, workspace -> will be covered by TableInfo class
      metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
      path - path to file 
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Path location; - should contain directory to which file belongs
      

      org.apache.drill.metastore.RowGroupMetadata

      missing fields
      ------------------
      storagePlugin, workspace -> will be covered by TableInfo class
      metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
      path - path to file 
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Path location; - should contain directory to which file belongs
      

      8. Remove org.apache.drill.exec package from metastore module.
      9. Rename ColumnStatisticsImpl class.
      10. Separate existing classes in org.apache.drill.metastore package into sub-packages.
      11. Rename FileTableMetadata -> BaseTableMetadata
      12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata
      13. Introduce segment-level metadata class:

      class SegmentMetadata {
        TableInfo tableInfo;
        MetadataInfo metadataInfo;
        SchemaPath column;
        TupleMetadata schema;
        String location;
        Map<SchemaPath, ColumnStatistics> columnsStatistics;
        Map<String, StatisticsHolder> statistics;
        List<String> partitionValues;
        List<String> locations;
        long lastModifiedTime;
      }
      

      Segment metadata

      In the fix for this Jira, one of the changes is introducing segment level metadata.

      For now, metadata hierarchy is the following:

      • Table
      • Segment
      • Partition
      • File
      • Row group

      Segment represents some a part of the table united using some specific qualities. For example for file system tables, segment may correspond to directories with its data. For hive tables, segment corresponds to hive partitions.

      In opposite, partition metadata, will correspond to "drill partitions". It is groups of data which have the same values for specific columns within a file or row group.

      So filtering will be produced for table level, then for segments, after that for partitions, for files and then for row groups.

      Attachments

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              arina Arina Ielchiieva
              Arina Ielchiieva Arina Ielchiieva
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: