Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-13
Description
IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE events. However, ALTER_TABLE events that have trivial changes in StorageDescriptor are not handled in IMPALA-11534. Some of them can skip reloading file metadata. The thrift defination of StorageDescriptor (not all of the fields are related to file metadata):
// this object holds all the information about physical storage of the data belonging to a table struct StorageDescriptor { 1: list<FieldSchema> cols, // required (refer to types defined above) 2: string location, // defaults to <warehouse loc>/<db loc>/tablename 3: string inputFormat, // SequenceFileInputFormat (binary) or TextInputFormat` or custom format 4: string outputFormat, // SequenceFileOutputFormat (binary) or IgnoreKeyTextOutputFormat or custom format 5: bool compressed, // compressed or not 6: i32 numBuckets, // this must be specified if there are any dimension columns 7: SerDeInfo serdeInfo, // serialization and deserialization information 8: list<string> bucketCols, // reducer grouping columns and clustering columns and bucketing columns` 9: list<Order> sortCols, // sort order of the data in each bucket 10: map<string, string> parameters, // any user supplied key value hash 11: optional SkewedInfo skewedInfo, // skewed information 12: optional bool storedAsSubDirectories // stored as subdirectories or not }
The attached screenshot is an example comparing the before and after Table object of an ALTER_TABLE event that has trivial changes in StorageDescriptor. It just clears the field of 'storedAsSubDirectories:false', and that field defaults to be false. So actually makes no difference in the StorageDescriptor.
I think we can compare changes in the StorageDescriptor and only reload file metadata if any of these changes:
- 'location'
- 'storedAsSubDirectories'
Note that the default of 'storedAsSubDirectories' is false so removing 'storedAsSubDirectories:false' is considered as unchanged.
Attachments
Attachments
Issue Links
- causes
-
IMPALA-13403 Trivial changes in StorageDescriptor of ALTER_TABLE event is not enough to decide file metadata reload can be skipped
- Open