Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6596

MultipleInputs does not escape Path characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • mrv2
    • Fixed path escaping in MultipleInputs for paths containing commas or semicolons.
    • Patch

    Description

      Filenames containing commas or semicolons cause MultipleInputs to break since these characters are used for joining and storing the path names.

      MultipleInputs stores mapreduce.input.multipleinputs.dir.formats as:

      path;inputFormatClass,path2;inputFormatClass2[, ...]

      If a filename contains one of the characters used for joining the data then getInputFormatMap and getMapperTypeMap will fail.

      Looking at FileInputFormat.addInputPath() it uses escapeString and unescapeString from StringUtils. I took the same approach for escaping in MultipleInputs.

      Attachments

        1. MAPRED-6596.001.patch
          7 kB
          Zac Hopkinson
        2. MAPRED-6596.002.patch
          7 kB
          Zac Hopkinson

        Activity

          People

            Unassigned Unassigned
            zac-hopkinson Zac Hopkinson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: