Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
-
Description
We frequently run in the situation where Pig needs to deal with small files in the input. In this case a separate map is created for each file which could be very inefficient.
It would be greate to have an umbrella input format that can take multiple files and use them in a single split. We would like to see this working with different data formats if possible.
There are already a couple of input formats doing similar thing: MultifileInputFormat as well as CombinedInputFormat; howevere, neither works with ne Hadoop 20 API.
We at least want to do a feasibility study for Pig 0.8.0.
Attachments
Attachments
Issue Links
- relates to
-
PIG-2462 getWrappedSplit is incorrectly returning the first split instead of the current split.
- Closed