This was raised by the Phoenix team. During a profiling session we noticed that catching the joinedHeap up to the current rows via seek causes a performance regression, which makes the joinedHeap only efficient when either a high or low percentage is matched by the filter.
(High is fine, because the joinedHeap will not get behind as often and does not need to be caught up, low is fine, because the seek isn't happening frequently).
In our tests we found that the solution is quite simple: Replace seek with reseek. Patch coming soon.
|Field||Original Value||New Value|
|Component/s||Filters [ 12312133 ]|
|Component/s||Performance [ 12314193 ]|
|Component/s||regionserver [ 12312139 ]|
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Summary||JoinedHeap for essential column families should reseek instead of seek||JoinedHeap for non essential column families should reseek instead of seek|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Transition||Time In Source Status||Execution Times||Last Executer||Last Execution Date|
|33m 48s||1||Ted Yu||10/Apr/13 06:15|
|17h 1m||1||Lars Hofhansl||10/Apr/13 23:16|
|16d 17h 38m||1||Lars Hofhansl||27/Apr/13 16:55|