Attached you will find a testcase to reproduce the performance drop. Testcase should be run twice, once with generateLargeDataset set to true and once with false. The first run will generate the data, which takes about 15 minutes on my machine, the second run performs the actual test of the performance (with the generated data this can be executed 5 times). The test consists of collecting objects for deletion and next deleting them.
The output for the testcase:
Collecting 41371 objects took: 0:0:12:799 (12799.0 ms)
Deleting objects took: 0:0:6:267 (6267.0 ms)
duration 1st run: 0:0:19:66 (19066.0 ms)
Collecting 41371 objects took: 0:0:22:730 (22730.0 ms)
Deleting objects took: 0:0:5:569 (5569.0 ms)
duration 2nd run: 0:0:28:299 (28299.0 ms)
A couple of things I noticed:
1) The performance drop only occurs when a large amount of objects is involved (>20.000 objects). When it is small there is no performance drop.
2) The factor of the performance drop is proportional to the amount of objects, eg. 40.000 objects have a performance drop of 2, 50.000 objects have a performance drop of a factor 4.
3) The performance drop is caused in traversing the object tree, not the actual delete (which is actually faster in the second run).
Attached is also the profiler data for this test case. As you can see, the performance drop is caused AbstractHashedMap.clear(). Clear() iterates over all entries and sets them to null. Question is why is iterating so much slower in the second run when the same amount of objects is involved? I can imagine that leaving the data structure for the hashmap intact and adding objects with new identies will grow data structure and thus having impact on iterating over it, even if the number of entries stay the same. But this is just my assumption.
One other interesting thing to note is that after all objects have been collected and pm.deleteAll() + commit() is being called there is quite an increase in memory usage. After the collecting of the objects the memory usage is 40 mb, after committing of the deleteAll() the memory usage is 91 mb. So the memory usage is more than doubled even if all objects to delete have already been loaded into memory! This probably needs to be investigated in a separate issue. After the commit, the memory usage nicely drops back again to it's level when the transaction started. In the second run, the memory usage peaks at 105 mb, but this 15 mb increase might be related to the implementation of clear().
Btw, if you could send me the patched jar file I could run the test as well.