Presently scan queries over local partitions are not executed optimally.
If to run a scan query over a specific partition (by setting query.setPartition(...) parameter and or query.setLocal(true)) and start iterating over entries we will see that the Thread, that iterates over the data, waits for some event to happen.
In fact the Thread waits while a system pool's thread prepares an iterator with entries for it and only after that iterates over the returned result set. The flow looks this way:
- GridCacheLocalQueryFuture is created;
- when QueryCursor.iterator().next is called from the app thread (the Thread above), GridCacheLocalQueryFuture.execute() methods puts closure that will prepare content for the iterator in the system pool.
- a system Thread execute GridCacheQueryManager.runQuery() reading all the entries from partition and passing them back to the Thread at line 1553 by calling onPageReady(...) method.
The other bottleneck is that a system thread gets all the entries and passes them to the Thread which will lead to more garbaged Java heap especially if cache is OFFHEAP_TIRED.
Run attached test (ScanQueryStuff) and you will see with Visual VM that most of the time the test spends executing the code from system threads.
Finally, what have to be done:
- if ScanQuery is supposed to be executed locally (setPartition() refers to local partition or setLocal is set to true) then the calling application thread has to iterate over the data avoiding usage of the system pool;
- internal code mustn't read all entries from a partition initially. The iterator has to get one entry next after another. This will be a memory backpressure mechanism especially for OFFHEAP_TIRED.
My assumption is that the fixed version has to work in a similar way to iteration over local entries - cache.localEntries(CachePeekMode.PRIMARY);. Run attached LocalIteratorStuff to see with Visual VM that the application thread is fully utilized and system threads are idle.