UIMA
  1. UIMA
  2. UIMA-2385

Improve XmiCasDeserializer performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0SDK
    • Fix Version/s: 2.4.1SDK
    • Component/s: Core Java Framework
    • Labels:
      None

      Description

      I profiled an expensive CAS deserialization and found that 46% of the time was in CASImpl.ll_getFSForRef (the method that creates a FeatureStructure Java object for a CAS FS). All those calls were coming from deserializing arrays (of which this particular CAS has many).

      It is unnecessary to create FeatureStructure Java objects here. For non-array FSs, XmiCasDeserializer uses low-level CAS APIs in order to avoid this overhead. But for arrays, it currently does not use the low-level APIs.

      1. UIMA-2385.bps.patch
        0.6 kB
        Benjamin Segal

        Activity

        Hide
        Adam Lally added a comment -

        Well, I may have gotten some bad data from the profiler (yourkit), since deserialization time appears to be the same after my "fix".

        Show
        Adam Lally added a comment - Well, I may have gotten some bad data from the profiler (yourkit), since deserialization time appears to be the same after my "fix".
        Hide
        Benjamin Segal added a comment -

        Upon further analysis, the poor performance is due to (slowly) expanding the CAS (heaps). IF the XMI CAS contains hundreds of thousands to millions of 8, 16, or 64 bit feature values which are being added roughly one (or a few) at a time (through the XMI deserialization), then there is a high CAS expansion cost with the current implementation.

        The existing byte/short/long heap expansion algorithm is defined in CommonAuxHeap. The idea is that the heap will exponentially grow until a threshold is reached (DEFAULT_HEAP_MULT_LIMIT), after which point, exponential growth is replaced with a linear growth. All that sounds reasonable, but, DEFAULT_HEAP_MULT_LIMIT is only defined to be 1024. Once the array grows to 1024, the algorithm only expands by 1024 entries at a time.

        In our case, the XMI deserialization is only adding a few feature values at a time, and slowly expanding out to 1.5 million 18 seconds later (for 5 million, it's over 2 minutes).

        Furthermore, the byte/short/long heap seeding values (dealing with CAS expansion) are not currently configurable or exposed through the CAS (CASImpl) or CAS creation utility.

        I think, minimally, we should consider increasing the DEFAULT_HEAP_MULT_LIMIT to something a bit larger to allow for quicker expansion. The regular (32-bit) Heap uses a default size of 500,000. For what it's worth, I changed DEFAULT_HEAP_MULT_LIMIT to 512K, and cut the deserialization time by 2/3 or more (from 18 seconds down to 6 seconds, and 140 seconds down to 20 seconds). From a memory footprint perspective, that would allow an exponential expansion up to 4MBs (512K*8), and then linear expansion from there, which seems reasonable.

        We could also consider exposing DEFAULT_HEAP_MULT_LIMIT as a "setable" property (something analogous to CAS_INITIAL_HEAP_SIZE), and allow the cas creation utility to honor the requested limit.

        It should also be noted that binary CAS deserialization does not suffer this same fate, as the total heap sizes are self described within the binary blob, so the total heaps sizes are known prior to allocating the (heap) arrays.

        And it's also worth noting that this overhead may or may not be problematic, depending on the various types of use cases.

        Show
        Benjamin Segal added a comment - Upon further analysis, the poor performance is due to (slowly) expanding the CAS (heaps). IF the XMI CAS contains hundreds of thousands to millions of 8, 16, or 64 bit feature values which are being added roughly one (or a few) at a time (through the XMI deserialization), then there is a high CAS expansion cost with the current implementation. The existing byte/short/long heap expansion algorithm is defined in CommonAuxHeap. The idea is that the heap will exponentially grow until a threshold is reached (DEFAULT_HEAP_MULT_LIMIT), after which point, exponential growth is replaced with a linear growth. All that sounds reasonable, but , DEFAULT_HEAP_MULT_LIMIT is only defined to be 1024. Once the array grows to 1024, the algorithm only expands by 1024 entries at a time. In our case, the XMI deserialization is only adding a few feature values at a time, and slowly expanding out to 1.5 million 18 seconds later (for 5 million, it's over 2 minutes). Furthermore, the byte/short/long heap seeding values (dealing with CAS expansion) are not currently configurable or exposed through the CAS (CASImpl) or CAS creation utility. I think, minimally, we should consider increasing the DEFAULT_HEAP_MULT_LIMIT to something a bit larger to allow for quicker expansion. The regular (32-bit) Heap uses a default size of 500,000. For what it's worth, I changed DEFAULT_HEAP_MULT_LIMIT to 512K, and cut the deserialization time by 2/3 or more (from 18 seconds down to 6 seconds, and 140 seconds down to 20 seconds). From a memory footprint perspective, that would allow an exponential expansion up to 4MBs (512K*8), and then linear expansion from there, which seems reasonable. We could also consider exposing DEFAULT_HEAP_MULT_LIMIT as a "setable" property (something analogous to CAS_INITIAL_HEAP_SIZE), and allow the cas creation utility to honor the requested limit. It should also be noted that binary CAS deserialization does not suffer this same fate, as the total heap sizes are self described within the binary blob, so the total heaps sizes are known prior to allocating the (heap) arrays. And it's also worth noting that this overhead may or may not be problematic, depending on the various types of use cases.
        Hide
        Adam Lally added a comment -

        Excellent. I completely agree with increasing the DEFAULT_HEAP_MULTI_LIMIT to 512k. Please submit a patch. I don't have any particularly strong feeling that it needs to be configurable, but if you want to do that I wouldn't object.

        Show
        Adam Lally added a comment - Excellent. I completely agree with increasing the DEFAULT_HEAP_MULTI_LIMIT to 512k. Please submit a patch. I don't have any particularly strong feeling that it needs to be configurable, but if you want to do that I wouldn't object.
        Hide
        deprecated (use "rec") added a comment -

        I think it could be useful as a tunable parameter in scenarios where many CASes are in memory at the same time. Is there any particular reason that it should not be configurable?

        Show
        deprecated (use "rec") added a comment - I think it could be useful as a tunable parameter in scenarios where many CASes are in memory at the same time. Is there any particular reason that it should not be configurable?
        Hide
        Adam Lally added a comment -

        I consider this analogous to how an ArrayList's capacity grows in Java. They provided some reasonable implementation that seems to work for everyone, and they don't expose ways to tune it.

        With the multi limit at 512K we are talking about at most 4MB of unused space, and I don't consider that excessive. As I said though, I wouldn't object if someone made this parameterizable.

        Show
        Adam Lally added a comment - I consider this analogous to how an ArrayList's capacity grows in Java. They provided some reasonable implementation that seems to work for everyone, and they don't expose ways to tune it. With the multi limit at 512K we are talking about at most 4MB of unused space, and I don't consider that excessive. As I said though, I wouldn't object if someone made this parameterizable.
        Hide
        Benjamin Segal added a comment -

        I attached a patch that changes DEFAULT_HEAP_MULTI_LIMIT to 512K. I did not include function to make this parameterizable (though, could be done in the future, if necessary).

        Show
        Benjamin Segal added a comment - I attached a patch that changes DEFAULT_HEAP_MULTI_LIMIT to 512K. I did not include function to make this parameterizable (though, could be done in the future, if necessary).
        Hide
        Marshall Schor added a comment -

        Applied patch.

        Show
        Marshall Schor added a comment - Applied patch.

          People

          • Assignee:
            Marshall Schor
            Reporter:
            Adam Lally
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development