Description
Currently, we default to enabling the CombineFileInputFormat settings for any extensions of FileSourceImpl b/c it tends to improve performance for common file formats like text, sequence files, and Avro files. However, this default has caused problems for formats like Parquet and for custom file formats that have complex split logic.
This JIRA is to track modifying the default combine file settings in at least some contexts, such as with From.formattedFile for custom input formats.
Attachments
Attachments
Issue Links
- is duplicated by
-
CRUNCH-369 Crunch doesn't use custom getSplits functions of FileInputFormat subclasses
- Closed