Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-369

Crunch doesn't use custom getSplits functions of FileInputFormat subclasses

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: IO
    • Labels:
      None

      Description

      Suppose I create a source for a custom InputFormat which is a subclass of FileInputFormat; e.g.

      TableSource<LongWritable, FastQWritable> source = From.formattedFile(
      inputFile, FastQInputFormatNew.class, LongWritable.class,
      FastQWritable.class);

      where FastQInputFormat is a subclass of FileInputFormat.

      This won't work as expected because by default CrunchInputFormat.getSplits will end up using CrunchCombineFileInputFormat to split the file. This doesn't work because my custom FIleInputFormat uses a custom file splitter.

      I can work around this by explicitly disabling the combining: e.g

      source.inputConf(RuntimeParameters.DISABLE_COMBINE_FILE, Boolean.TRUE.toString());

      but this doesn't strike me as the best solution. If I tell Crunch to use a custom InputFormat I shouldn't have to specify a second config option in order to tell Crunch to respect the getSplits function in my custom InputFormat.

      I think CrunchInputFormat.getSplits should check that the format class exactly matches FileInputFormat; i.e. it isn't a subclass. For subclasses Crunch should use the getsplits function in the custom InputFormat class. I think changing the check to the following might work

      if (format.getClass().equals(FileInputFormat.class) &&
      !conf.getBoolean(RuntimeParameters.DISABLE_COMBINE_FILE, true)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jeremy@lewi.us Jeremy Lewi
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: