A few comments from the author of this monstrosity First, thanks Ferdy for taking time to work with this, it's much appreciated, we need to move forward on this. I agree that ultimately this test should be moved to Gora and become a part of a larger test suite that verifies correctness of concurrent multi-threaded and multi-process operations.
However, the immediate purpose of this class was to stress-test the existing Gora versions in usage patterns typical for Nutch, in order to verify that a particular version of Gora is a viable storage layer for Nutch - so the test tries to replicate typical Nutch scenarios. Remember that this has to work not only for a toy crawl in a single JVM in local mode, but also for a fully distributed parallel map-reduce crawl. Consequently:
- testMultiThread: tests a scenario of multiple threads in a single JVM all writing to the same storage instance. This replicates a scenario present e.g. in a single Fetcher task. If this test fails (assuming it's properly constructed!) then this means that Gora will fail, perhaps silently (see
NUTCH-893), in a fundamental Nutch tool.
- testMultiProcess: tests a scenario of multiple processes running in multiple JVMs all writing to the same storage instance. This replicates a scenario of multiple map-reduce tasks all using the same storage config (shared storage, e.g. HSQLDB in server mode), and it's fundamental to all Nutch tools running on a cluster. In map-reduce jobs there are usually many concurrent tasks, and some of them may execute in several copies in parallel (speculative execution) and some others may fail catastrophically without proper cleanup - and Gora backends must just deal with it. If this test fails (again, assuming it's properly constructed and doesn't exceed some OS capabilities of the test machine, or some known limits of a storage impl. like the number of concurrent connections) then it means that Gora storage is not reliable for a typical map-reduce usage, which sort of defeats the point of using it at all.
To summarize: I think the patch in its current form helps the tests pass, but I don't think it addresses the underlying problems in Gora (or perhaps the problems with HSQL backend), rather it hides the problem. After all, we want the test to mean something if it passes, to verify that we can use Gora for more than a toy crawl, with guarantees of correctness in presence of concurrent updates.
If the above errors don't indicate issues with Gora, but instead are caused by exceeded OS or hsql limits, or hsql misconfiguration, then of course we should fix the configs and adjust the numbers so that they make sense. But with the proper config and proper numbers both tests should pass, otherwise we can't be sure that Gora is working properly at all.