Description
I locally tweaked tpch_real_world to use hash partitioning instead of range partitioning, so that the different threads overlapped on the same tablets, simulating a more realistic parallel load scenario. I noticed that the MM threads were CPU bound, with a high percentage of CPU in AddCodeWords(). Initial prototypes indicate that optimizing the hashmap used here would be an easy win.