Uploaded image for project: 'HCatalog'
  1. HCatalog
  2. HCATALOG-237

Switch from using StorageDrivers to SerDes to do data (de)serialization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4
    • 0.4
    • None
    • None

    Description

      HCatalog started by creating its own classes, InputStorageDriver and OutputStorageDriver, to do data conversion between the storage layer Input/OutputFormats and the HCatInput/OutputFormats. These provide very similar functionality to Hive's SerDe class, though with a much simpler interface.

      This usage of separate classes has led to a number of issues for HCatalog. One, it cannot make use of existing Hive SerDes. Two, it has led to a need to make HCat specific extensions of Hive interfaces (such as the StorageHandler) to provide the StorageDescriptors. Three, it means that users who already have Hive installed cannot use HCatalog without first updating every partition in their metastore with storage driver information.

      I propose we switch to using SerDes for this. To address the issue of the more complicated SerDe interface we can provide adaptor classes that make writing new SerDes easy in simple cases.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gates Alan Gates
              Votes:
              3 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: