Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
The proposal is that we can reuse serde info as how we reuse column descriptors. (HIVE-2246)
Currently, we store the metadata for partitions as PARTITIONS (N partitions) -> SDS (N locations) -> SERDES (N entries). However, all the SERDES for the partitions in a table are the same if we don't explicitly specify it. That is, each storage descriptor has a associated and exclusive serde info, but the partitions' serde infos are mostly just the same as the table's. By reusing the serde info, we can save some database storage and enhance the query performance from HMS to the backend database.
For backward compatibility, we also need to introduce a config for this feature because there will be issues if HMS old instance and HMS new instance with this feature are running together. With this feature, we will need to check if others reference the serdes before deleting it, but the old instance will just delete it.
The other thing we need to take care of is custom serdes. If a partition's serde is modified, we need to create a new record in SERDES so that we don't interfere other partitions.