Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The TeraSort sampler that reads from multiple splits to come up with the partition information can be made multi-threaded, where multiple threads would read from multiple splits concurrently. That should lead to better performance and also we could attempt at sampling more records to arrive at a better partition info.