Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.2.0
-
None
-
None
-
None
Description
Currently order by is done in three MR jobs:
job 1: read data in whatever loader the user requests, store using BinStorage
job 2: load using RandomSampleLoader, find quantiles
job 3: load data again and sort
It is done this way because RandomSampleLoader extends BinStorage, and so needs the data in that format to read it.
If the logic in RandomSampleLoader was made into an operator instead of being in a loader then jobs 1 and 2 could be merged. On average job 1 takes about 15% of the time of an order by script.