Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12487

Skip reloading file metadata for ALTER_TABLE events with trivial changes in StorageDescriptor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Impala 4.4.0
    • None
    • None

    Description

      IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE events. However, ALTER_TABLE events that have trivial changes in StorageDescriptor are not handled in IMPALA-11534. Some of them can skip reloading file metadata. The thrift defination of StorageDescriptor (not all of the fields are related to file metadata):

      // this object holds all the information about physical storage of the data belonging to a table
      struct StorageDescriptor {
        1: list<FieldSchema> cols,  // required (refer to types defined above)
        2: string location,         // defaults to <warehouse loc>/<db loc>/tablename
        3: string inputFormat,      // SequenceFileInputFormat (binary) or TextInputFormat`  or custom format
        4: string outputFormat,     // SequenceFileOutputFormat (binary) or IgnoreKeyTextOutputFormat or custom format
        5: bool   compressed,       // compressed or not
        6: i32    numBuckets,       // this must be specified if there are any dimension columns
        7: SerDeInfo    serdeInfo,  // serialization and deserialization information
        8: list<string> bucketCols, // reducer grouping columns and clustering columns and bucketing columns`
        9: list<Order>  sortCols,   // sort order of the data in each bucket
        10: map<string, string> parameters, // any user supplied key value hash
        11: optional SkewedInfo skewedInfo, // skewed information
        12: optional bool   storedAsSubDirectories       // stored as subdirectories or not
      } 

      The attached screenshot is an example comparing the before and after Table object of an ALTER_TABLE event that has trivial changes in StorageDescriptor. It just clears the field of 'storedAsSubDirectories:false', and that field defaults to be false. So actually makes no difference in the StorageDescriptor.

      I think we can compare changes in the StorageDescriptor and only reload file metadata if any of these changes:

      • 'location'
      • 'storedAsSubDirectories'

      Note that the default of 'storedAsSubDirectories' is false so removing 'storedAsSubDirectories:false' is considered as unchanged.

      CC hemanth619, csringhofer 

      Attachments

        Issue Links

          Activity

            People

              hemanth619 Sai Hemanth Gantasala
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: