Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-266

Improve SerDe performance by using Text instead of String

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.4.0
    • 0.4.0
    • None
    • Incompatible change, Reviewed

    Description

      A recent performance study showed that 2 places in Hive code has exhibited large cpu usage percentage:
      1. String.getBytes() (UTF-8 encoding)
      2. String.split()

      We should replace String with Text object to:
      1. Avoid UTF-8 decoding and encoding
      2. Reuse the Text object and avoid creating new objects for each column in each row like in String.split()

      This is expected to give a big (20%+) performance improvement to Hive.

      Attachments

        1. HIVE-266.1.patch
          1.27 MB
          Zheng Shao
        2. HIVE-266.2.patch
          1.28 MB
          Zheng Shao
        3. HIVE-266.3.patch
          1.37 MB
          Zheng Shao
        4. HIVE-266.4.patch
          1.40 MB
          Zheng Shao
        5. HIVE-266.5.patch
          1.40 MB
          Zheng Shao
        6. HIVE-266.6.patch
          1.40 MB
          Zheng Shao
        7. HIVE-266.7.patch
          1.41 MB
          Zheng Shao
        8. HIVE-266.8.patch
          914 kB
          Zheng Shao

        Issue Links

          Activity

            People

              zshao Zheng Shao
              zshao Zheng Shao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: