OpenJPA
  1. OpenJPA
  2. OPENJPA-439

Performance degradation in multi-transaction operations

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.9.6, 0.9.7, 1.0.0, 1.0.1, 1.0.2, 1.1.0
    • Fix Version/s: 1.1.0
    • Component/s: kernel
    • Labels:
      None

      Description

      Reusing a Broker for multiple transactions / persistence contexts demonstrates a performance degradation, possibly due to explicit calls to clear sets and maps, rather than just dereferencing them.

      Discussion: http://www.nabble.com/Performance-drop-in-AbstractHashedMap.clear%28%29-tf4769771.html#a13656730

      1. testcaseperformance.zip
        4 kB
        Christiaan
      2. performance testcase results.zip
        74 kB
        Christiaan
      3. OPENJPA-439-patch.jar
        45 kB
        Patrick Linskey
      4. OPENJPA-439.patch
        5 kB
        Patrick Linskey

        Activity

        Hide
        Christiaan added a comment -
        Show
        Christiaan added a comment - Created issue https://issues.apache.org/jira/browse/OPENJPA-441 for this
        Hide
        Patrick Linskey added a comment -

        Great! Thanks for running the numbers.

        > There is still one question pending. As mentioned, once all objects to be
        > deleted into memory (40 mb) performing pm.deleteAll() increases the
        > memory with another 45 mb. Is this as expected or should this be
        > investigated as well?

        I think it's definitely worth investigating. I'm going to commit the changes to this issue and mark it as resolved, though; can you open a new issue for the potential memory leak?

        Show
        Patrick Linskey added a comment - Great! Thanks for running the numbers. > There is still one question pending. As mentioned, once all objects to be > deleted into memory (40 mb) performing pm.deleteAll() increases the > memory with another 45 mb. Is this as expected or should this be > investigated as well? I think it's definitely worth investigating. I'm going to commit the changes to this issue and mark it as resolved, though; can you open a new issue for the potential memory leak?
        Hide
        Christiaan added a comment -

        I did a rerun of the testcase with the provided patch (I had to some changes to the query since Kodo 4.1.4 results in an exception with this openjpa version). The results are great:

        Collecting 41371 objects took: 0:0:12:61 (12061.0 ms)
        Deleting objects took: 0:0:6:356 (6356.0 ms)
        duration 1st run: 0:0:18:417 (18417.0 ms)
        Collecting 41371 objects took: 0:0:10:713 (10713.0 ms)
        Deleting objects took: 0:0:5:751 (5751.0 ms)
        duration 2nd run: 0:0:16:464 (16464.0 ms)

        As you can see the collecting of objects in the second run has now been reduced from 22 seconds to 10 seconds (see output in previous comment), which is even faster than the first run! The hotspots for the first and second run are now the same and the top 10 hotspots are java. methods (the first openjpa hotspot is org.apache.openjpa.jdbc.sql.SelectImpl.getTableIndex which has only 0,3% self time). I also had a look at the memory usage since as mentioned the second run had a 15 mb increase of memory compared to the first run. This 15 mb increase of memory is now solved as well, so the second run has the same memory pattern has the first run.

        There is still one question pending. As mentioned, once all objects to be deleted into memory (40 mb) performing pm.deleteAll() increases the memory with another 45 mb. Is this as expected or should this be investigated as well?

        Show
        Christiaan added a comment - I did a rerun of the testcase with the provided patch (I had to some changes to the query since Kodo 4.1.4 results in an exception with this openjpa version). The results are great: Collecting 41371 objects took: 0:0:12:61 (12061.0 ms) Deleting objects took: 0:0:6:356 (6356.0 ms) duration 1st run: 0:0:18:417 (18417.0 ms) Collecting 41371 objects took: 0:0:10:713 (10713.0 ms) Deleting objects took: 0:0:5:751 (5751.0 ms) duration 2nd run: 0:0:16:464 (16464.0 ms) As you can see the collecting of objects in the second run has now been reduced from 22 seconds to 10 seconds (see output in previous comment), which is even faster than the first run! The hotspots for the first and second run are now the same and the top 10 hotspots are java. methods (the first openjpa hotspot is org.apache.openjpa.jdbc.sql.SelectImpl.getTableIndex which has only 0,3% self time). I also had a look at the memory usage since as mentioned the second run had a 15 mb increase of memory compared to the first run. This 15 mb increase of memory is now solved as well, so the second run has the same memory pattern has the first run. There is still one question pending. As mentioned, once all objects to be deleted into memory (40 mb) performing pm.deleteAll() increases the memory with another 45 mb. Is this as expected or should this be investigated as well?
        Hide
        Christiaan added a comment -

        Attached you will find a testcase to reproduce the performance drop. Testcase should be run twice, once with generateLargeDataset set to true and once with false. The first run will generate the data, which takes about 15 minutes on my machine, the second run performs the actual test of the performance (with the generated data this can be executed 5 times). The test consists of collecting objects for deletion and next deleting them.

        The output for the testcase:
        Collecting 41371 objects took: 0:0:12:799 (12799.0 ms)
        Deleting objects took: 0:0:6:267 (6267.0 ms)
        duration 1st run: 0:0:19:66 (19066.0 ms)
        Collecting 41371 objects took: 0:0:22:730 (22730.0 ms)
        Deleting objects took: 0:0:5:569 (5569.0 ms)
        duration 2nd run: 0:0:28:299 (28299.0 ms)

        A couple of things I noticed:
        1) The performance drop only occurs when a large amount of objects is involved (>20.000 objects). When it is small there is no performance drop.
        2) The factor of the performance drop is proportional to the amount of objects, eg. 40.000 objects have a performance drop of 2, 50.000 objects have a performance drop of a factor 4.
        3) The performance drop is caused in traversing the object tree, not the actual delete (which is actually faster in the second run).

        Attached is also the profiler data for this test case. As you can see, the performance drop is caused AbstractHashedMap.clear(). Clear() iterates over all entries and sets them to null. Question is why is iterating so much slower in the second run when the same amount of objects is involved? I can imagine that leaving the data structure for the hashmap intact and adding objects with new identies will grow data structure and thus having impact on iterating over it, even if the number of entries stay the same. But this is just my assumption.

        One other interesting thing to note is that after all objects have been collected and pm.deleteAll() + commit() is being called there is quite an increase in memory usage. After the collecting of the objects the memory usage is 40 mb, after committing of the deleteAll() the memory usage is 91 mb. So the memory usage is more than doubled even if all objects to delete have already been loaded into memory! This probably needs to be investigated in a separate issue. After the commit, the memory usage nicely drops back again to it's level when the transaction started. In the second run, the memory usage peaks at 105 mb, but this 15 mb increase might be related to the implementation of clear().

        Btw, if you could send me the patched jar file I could run the test as well.

        Show
        Christiaan added a comment - Attached you will find a testcase to reproduce the performance drop. Testcase should be run twice, once with generateLargeDataset set to true and once with false. The first run will generate the data, which takes about 15 minutes on my machine, the second run performs the actual test of the performance (with the generated data this can be executed 5 times). The test consists of collecting objects for deletion and next deleting them. The output for the testcase: Collecting 41371 objects took: 0:0:12:799 (12799.0 ms) Deleting objects took: 0:0:6:267 (6267.0 ms) duration 1st run: 0:0:19:66 (19066.0 ms) Collecting 41371 objects took: 0:0:22:730 (22730.0 ms) Deleting objects took: 0:0:5:569 (5569.0 ms) duration 2nd run: 0:0:28:299 (28299.0 ms) A couple of things I noticed: 1) The performance drop only occurs when a large amount of objects is involved (>20.000 objects). When it is small there is no performance drop. 2) The factor of the performance drop is proportional to the amount of objects, eg. 40.000 objects have a performance drop of 2, 50.000 objects have a performance drop of a factor 4. 3) The performance drop is caused in traversing the object tree, not the actual delete (which is actually faster in the second run). Attached is also the profiler data for this test case. As you can see, the performance drop is caused AbstractHashedMap.clear(). Clear() iterates over all entries and sets them to null. Question is why is iterating so much slower in the second run when the same amount of objects is involved? I can imagine that leaving the data structure for the hashmap intact and adding objects with new identies will grow data structure and thus having impact on iterating over it, even if the number of entries stay the same. But this is just my assumption. One other interesting thing to note is that after all objects have been collected and pm.deleteAll() + commit() is being called there is quite an increase in memory usage. After the collecting of the objects the memory usage is 40 mb, after committing of the deleteAll() the memory usage is 91 mb. So the memory usage is more than doubled even if all objects to delete have already been loaded into memory! This probably needs to be investigated in a separate issue. After the commit, the memory usage nicely drops back again to it's level when the transaction started. In the second run, the memory usage peaks at 105 mb, but this 15 mb increase might be related to the implementation of clear(). Btw, if you could send me the patched jar file I could run the test as well.
        Hide
        Christiaan added a comment -

        profiler screenshots

        Show
        Christiaan added a comment - profiler screenshots
        Hide
        Patrick Linskey added a comment -

        Moving most of the BrokerImpl collections to recreate instead of clear sets and maps. I left a few probably-small collections as-is.

        It'd be great to get some numbers about the impact of this patch.

        Show
        Patrick Linskey added a comment - Moving most of the BrokerImpl collections to recreate instead of clear sets and maps. I left a few probably-small collections as-is. It'd be great to get some numbers about the impact of this patch.

          People

          • Assignee:
            Unassigned
            Reporter:
            Patrick Linskey
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development