Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25779

Deduplicate SerDe Info

    XMLWordPrintableJSON

Details

    Description

      The proposal is that we can reuse serde info as how we reuse column descriptors. (HIVE-2246)

      Currently, we store the metadata for partitions as PARTITIONS (N partitions) -> SDS (N locations) -> SERDES (N entries). However,  all the SERDES for the partitions in a table are the same if we don't explicitly specify it. That is, each storage descriptor has a associated and exclusive serde info, but the partitions' serde infos are mostly just the same as the table's. By reusing the serde info, we can save some database storage and enhance the query performance from HMS to the backend database.

      For backward compatibility, we also need to introduce a config for this feature because there will be issues if HMS old instance and HMS new instance with this feature are running together. With this feature, we will need to check if others reference the serdes before deleting it, but the old instance will just delete it.

      The other thing we need to take care of is custom serdes. If a partition's serde is modified, we need to create a new record in SERDES so that we don't interfere other partitions.

      Attachments

        Activity

          People

            hsnusonic Yu-Wen Lai
            hsnusonic Yu-Wen Lai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 20m
                4h 20m