Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Grouping currently fetches splits from the underlying file format.
It'd be useful to allow grouping to accept a set of splits instead of always fetching them from the underlying format.
One example of where this will be used : Bucketed Hive data - regular HiveInputFormat splits are generated, only splits belonging to the same bucket can be Grouped together.