Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently we will create one RDD per bucket and coalesce it to one partition, and finally union them to a final RDD. We should create a single RDD instead, it requires to modify the data source interface a little bit and abstract the logic of reader out to decouple it from RDD.