Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552 Drill Metadata management "Drill MetaStore"
  3. DRILL-7271

Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

    Details

    • Type: Sub-task
    • Status: Reviewable
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.17.0
    • Component/s: None
    • Labels:
      None

      Description

      1. Merge info from metadataStatistics + statisticsKinds into one holder: Map<String, StatisticsHolder>.
      2. Rename hasStatistics to hasDescriptiveStatistics
      3. Remove drill-file-metastore-plugin
      4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: SEGMENT.
      5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
      6. Add new info classes:

      class TableInfo {
        String storagePlugin;
        String workspace;
        String name;
        String type;
        String owner;
      }
      
      class MetadataInfo {
      
        public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
        public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
      
        MetadataType type (enum);
        String key;
        String identifier;
      }
      

      7. Modify existing metadata classes:
      org.apache.drill.metastore.FileTableMetadata

      missing fields
      ------------------
      storagePlugin, workspace, tableType -> will be covered by TableInfo class
      metadataType, metadataKey -> will be covered by MetadataInfo class
      interestingColumns
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Set<String> partitionKeys; -> Map<String, String>
      

      org.apache.drill.metastore.PartitionMetadata

      missing fields
      ------------------
      storagePlugin, workspace -> will be covered by TableInfo class
      metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
      partitionValues (List<String>)
      location (String) (for directory level metadata) - directory location
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Set<Path> location; -> locations
      

      org.apache.drill.metastore.FileMetadata

      missing fields
      ------------------
      storagePlugin, workspace -> will be covered by TableInfo class
      metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
      path - path to file 
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Path location; - should contain directory to which file belongs
      

      org.apache.drill.metastore.RowGroupMetadata

      missing fields
      ------------------
      storagePlugin, workspace -> will be covered by TableInfo class
      metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
      path - path to file 
      
      fields to modify
      ----------------
      private final Map<String, Object> tableStatistics;
      private final Map<String, StatisticsKind> statisticsKinds;
      private final Path location; - should contain directory to which file belongs
      

      8. Remove org.apache.drill.exec package from metastore module.
      9. Rename ColumnStatisticsImpl class.
      10. Separate existing classes in org.apache.drill.metastore package into sub-packages.
      11. Rename FileTableMetadata -> BaseTableMetadata
      12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata
      13. Introduce segment-level metadata class:

      class SegmentMetadata {
        TableInfo tableInfo;
        MetadataInfo metadataInfo;
        SchemaPath column;
        TupleMetadata schema;
        String location;
        Map<SchemaPath, ColumnStatistics> columnsStatistics;
        Map<String, StatisticsHolder> statistics;
        List<String> partitionValues;
        List<String> locations;
        long lastModifiedTime;
      }
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vvysotskyi Volodymyr Vysotskyi
                Reporter:
                arina Arina Ielchiieva
                Reviewer:
                Arina Ielchiieva
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: