Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10727

Share identical CachedHmsPartitionDescriptor across HdfsPartitions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Catalog
    • None
    • ghx-label-4

    Description

      In catalogd, we keep one CachedHmsPartitionDescriptor for each HdfsPartition. Many fields in it could be identical, e.g. sdBucketCols, sdSortCols. We can keep different CachedHmsPartitionDescriptor in HdfsTable instead and share them to the HdfsPartition. For fields that differs across partitions, e.g. msCreateTime, msLastAccessTime, we can move them to HdfsPartition.

      https://github.com/apache/impala/blob/1a84a1420c5d517f43e4c7e90ee204db30f27d57/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L543

        // TODO: Cache this descriptor in HdfsTable so that identical descriptors are shared
        // between HdfsPartition instances.
        // TODO: sdInputFormat and sdOutputFormat can be mutated by Impala when the file format
        // of a partition changes; move these fields to HdfsPartition.
        private static class CachedHmsPartitionDescriptor {
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: