Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2851

Excessive alloations in StringOfSpecifiedLengthMixin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 4.0.0
    • Back End, Performance
    • None

    Description

      The StringOfSpecifiedLengthMixin passes in the value of the "maximumSimpleElementSizeInCharacters" tunable to the getSomeString function:

      https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/StringLengthParsers.scala#L89-L94

      The getSomeString function calls withLocalCharBuffer which allocates a char buffer of that size where it will decode the string. Currently, the tunable defaults to 1MB. This size is pretty large, large enough to be a noticeable contributor to allocations and cpu usage when profiling.

      Fortunately, the allocated char buffer is cached and reused during the parse (though each parse allocates a new one), so it's only a one time penalty per parse. But most files are not going to have single strings nearly that large so this large allocation is just a waste.

      We should consider ways to reduce this allocation. Maybe simply decrease the tunable? Or maybe change the logic so StringOfSpecifiedLength allocates a much smaller amount, and grows the buffer if needed, maybe taking into account bitLimit? Or maybe the buffer is shared among different parses in a ThreadLocal, so we still allocate a large buffer, but the penalty is only once per thread instead of once per parse? Likely other options...

      Attachments

        Activity

          People

            slawrence Steve Lawrence
            slawrence Steve Lawrence
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: