Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5936

MultipleInputs incorrect output with copied Path

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: mrv1, mrv2
    • Labels:
      None

      Description

      The MultipleInputs class builds a Map with Path objects as keys and mapper-inputformat combinations as values.

      This is not correct behavior. If MultipleInputs.addInputPath is called twice with the same Path and (for example) two different Mapper classes, the second addition will silently overwrite the first.

      Expected behavior would be that the input file would be processed one time for each call to addInputPath.

      This is necessary for applications which are doing join-like operations: joining a file with itself is valid, and it should not be incumbent on the application developer to recognize when the same Path is included twice to work around this bug.

      MultipleInputs should be using a multimap or a map with List values.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bjacobs Bryan Jacobs
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: