Bug 41921 - JDBC sampler : Add hashing of Data to avoid storing all output into memory when result is arbitrarily large
JDBC sampler : Add hashing of Data to avoid storing all output into memory wh...
Status: NEW
Product: JMeter
Classification: Unclassified
Component: Main
2.2
All All
: P2 enhancement (vote)
: ---
Assigned To: JMeter issues mailing list
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2007-03-21 10:29 UTC by Nathan Bryant
Modified: 2012-10-24 01:48 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nathan Bryant 2007-03-21 10:29:40 UTC
JDBCSampler (and I presume other samplers) store all the output received from
their test action. example:

Data data = getDataFromResultSet(rs);					
res.setResponseData(data.toString().getBytes());

This is poor software design because the data could be arbitrarily large and
fill memory. It is causing OutOfMemoryErrors for us, even with not very many
threads. This is major or even critical because it prevents JMeter from being
used to generate significant load. All samplers should be rewritten to just
build an MD5 hash iteratively. The hash should be updated one buffer or row at a
time instead of in bulk.
Comment 1 Sebb 2007-03-21 10:59:04 UTC
The full sample results are needed for some purposes - e.g. the Tree View 
Listener can display the results of an HTTP Sample, and Assertions need the 
response to be present - so it would not make sense to _always_ throw away the 
response data.

And the data needs to be retrieved, otherwise the sample time will not be 
representative.

However sometimes it is not ideal to store all the response data.

As a work-round you could perhaps do one of the following:
* change the query to limit the data returned
* add a BeanShell Post-Processor to zap the responseData field.

As to how to fix this: there could be an option to limit the size of the stored 
data. That should be fairly easy to do.
Comment 2 Nathan Bryant 2007-03-21 11:47:00 UTC
An MD5 or similar hash would be preferable over just storing part of the data,
for people who are using the data for functional testing. Then they could
compare everything for identity at least. I'm not doing functional testing so I
don't care, but I would recommend adding a configuration checkbox for an MD5 mode.
Comment 3 Sebb 2008-04-07 08:49:44 UTC
I've been looking into how to add hashing to the JDBC sampler.

It would be easy enough to collect all the response data and convert it to a hash just before storing it. Would that be enough for your tests? The disadvantage of this approach is that JMeter would need enough memory to store the whole response - but at least it would be only temporary.

A better solution would be to hash the data as it is retrieved. However this is not  particularly easy to do, as the data is all fetched and then formatted into lines and columns.

Also, is it important that the hash is the same as the one that would be obtained by hashing the result data after download? Or does it just need to contain all the response data in a predictable order? This would be easier to do, as there would be no need for the second formatting stage.

Any other suggestions for how to process the JDBC data are welcome...
Comment 4 Gregg 2009-06-19 12:59:55 UTC
For my own curiosity, what magnitude of data is being dealt with here?  Are we talking hundreds of megabytes? Gigabytes?  Tens or hundreds of gigabytes?  The reason I ask is because my first thought was to simply have the user increase the maximum heap size of the JVM.  What is the user currently using as the maximum heap size?
Comment 5 Philippe Mouawad 2011-11-14 12:12:14 UTC
Still missing in 2.5.1
Comment 6 Evan M 2012-06-25 15:43:25 UTC
Gregg: Increasing the JVM memory does not help.  The order of magnitude is gigabytes of data for me, but it doesn't really matter, because the application just ramps up memory until it runs out.  I should be able to run a test for an arbitrarily long amount of time if I don't need to store the result data.

For my use case, I want to test the maximum throughput of a large select statement from my webserver to my database, but the application caps out its memory before I can get any useful data.  If I don't have any listeners that need the response data, it should not be cached.

I am having this issue running 2.7 r1342410 on Windows Server 2008.