Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-553

Add BinarySortableSerDe to Hive

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3.0
    • 0.4.0
    • None
    • None
    • Reviewed

    Description

      Currently the most popular SerDe in Hive is LazySimpleSerDe. LazySimpleSerDe has the benefit of being simple (use text format to store data), but its performance may suffer in the following cases:
      1. For double values, we are storing them in text format which is very space-inefficient, and both serialization and deserialization are slow;
      2. For complex type of columns that contains a lot of levels, we are scanning the buffer once per level, which is very inefficient.

      We should add a binary serde format that stores the data in binary format. The format should have the following properties:
      1. Compact: it should be space-efficient;
      2. Fast: it should be efficiently to deserialize the data, especially for double values and complex types.
      3. It should support serializing NULL values.

      Attachments

        1. HIVE-553.2.patch
          44 kB
          Zheng Shao
        2. HIVE-553.3.patch
          41 kB
          Zheng Shao
        3. HIVE-553.4.patch
          42 kB
          Zheng Shao
        4. HIVE-553.5.patch
          44 kB
          Zheng Shao
        5. HIVE-553.6.patch
          323 kB
          Zheng Shao

        Issue Links

          Activity

            People

              zshao Zheng Shao
              zshao Zheng Shao
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: