Did some profiling again.
I we can gain some performance  when passing buffer, rowoffset, and rowlength instead of making a copy of the row key.
That way we can also remove the row key caching (and this patch also removes the timestamps caching). Considering the sheer number in which we create KVs, every byte save is good.
 (15-20% when data is in the block cache we setup a Filter such that only a single row is returned to the client).