Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.14.0
-
None
-
None
Description
Ran into a problem when using Crunch to process a lot of data from S3: the getSize checks can be very slow to run and don't materially add much to the overall processing of a pipeline when things like reducer counts are manually specified. I'd like to add a way to disable the file size checks, either globally or for specific input sources.
Attachments
Issue Links
- relates to
-
CRUNCH-683 Avoid unnecessary listStatus calls from getSize computation
- Resolved