Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7144

GC pressure during ORC StringDictionary writes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • File Formats
    • ORC Table ~ 12 string columns

    • Use Text writables directly in ORC dictionaries to avoid String allocations.

    Description

      When ORC string dictionary writes data out, it suffers from bad GC performance due to a few allocations in-loop.

      The conversions are as follows

      StringTreeWriter::getStringValue() causes 2 conversions

      LazyString -> Text (LazyString::getWritableObject)
      Text -> String (LazyStringObjectInspector::getPrimitiveJavaObject)

      Then StringRedBlackTree::add() does one conversion

      String -> Text

      This causes some GC pressure with un-necessary String and byte[] array allocations.

      Attachments

        1. HIVE-7144.3.patch
          7 kB
          Gopal Vijayaraghavan
        2. HIVE-7144.2.patch
          7 kB
          Gopal Vijayaraghavan
        3. HIVE-7144.1.patch
          7 kB
          Gopal Vijayaraghavan
        4. orc-string-write.png
          145 kB
          Gopal Vijayaraghavan

        Issue Links

          Activity

            People

              gopalv Gopal Vijayaraghavan
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: