Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5113

Streaming input/output types are ignored with java mapper/reducer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.2-alpha
    • Fix Version/s: 2.1.0-beta
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After MAPREDUCE-1888, with a java mapper or reducer, StreamJob doesn't respect stream.map.output/stream.reduce.output when setting a job's output key/value classes, even if these configs are explicitly set by the user.

      As MAPREDUCE-1888 is not in branch-1, this change is only needed in hadoop 2.

      1. HADOOP-9300.patch
        12 kB
        Sandy Ryza
      2. HADOOP-9300.patch
        13 kB
        Sandy Ryza
      3. HADOOP-9300-1.patch
        14 kB
        Sandy Ryza
      4. HADOOP-9300-2.patch
        18 kB
        Sandy Ryza
      5. HADOOP-9300-2.patch
        18 kB
        Sandy Ryza
      6. HADOOP-9300-2.patch
        17 kB
        Sandy Ryza
      7. HADOOP-9300-3.patch
        16 kB
        Sandy Ryza
      8. MAPREDUCE-5113.patch
        4 kB
        Sandy Ryza

        Activity

        Sandy Ryza created issue -
        Sandy Ryza made changes -
        Field Original Value New Value
        Description When a hadoop streaming job is run with a java class as the reducer, or no reducer specified (which defaults to IdentityReducer), the output key and value classes are not set. This can cause a job to fail down the line. In an effort to avoid overwriting user configs (MAPREDUCE-1888), StreamJob doesn't set a job's output key/value classes unless they are specified in the streaming command line. If the configs aren't specified in either of these places, the streaming defaults (Text) no longer kick in, and the global default LongWritable is used.

        This can cause jobs/output writers that are expecting Text to fail.
        Sandy Ryza made changes -
        Attachment HADOOP-9300.patch [ 12569120 ]
        Sandy Ryza made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Sandy Ryza made changes -
        Attachment HADOOP-9300.patch [ 12569899 ]
        Sandy Ryza made changes -
        Attachment HADOOP-9300-1.patch [ 12570006 ]
        Sandy Ryza made changes -
        Attachment HADOOP-9300-2.patch [ 12570026 ]
        Sandy Ryza made changes -
        Attachment HADOOP-9300-2.patch [ 12570031 ]
        Sandy Ryza made changes -
        Attachment HADOOP-9300-2.patch [ 12570034 ]
        Sandy Ryza made changes -
        Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
        Key HADOOP-9300 MAPREDUCE-5113
        Affects Version/s 2.0.2-alpha [ 12322471 ]
        Affects Version/s 2.0.2-alpha [ 12322473 ]
        Component/s tools [ 12319643 ]
        Sandy Ryza made changes -
        Attachment HADOOP-9300-3.patch [ 12575946 ]
        Sandy Ryza made changes -
        Attachment MAPREDUCE-5113.patch [ 12576117 ]
        Sandy Ryza made changes -
        Summary Streaming fails to set output key class when reducer is java class Streaming input/output types are ignored with java mapper/reducer
        Sandy Ryza made changes -
        Description In an effort to avoid overwriting user configs (MAPREDUCE-1888), StreamJob doesn't set a job's output key/value classes unless they are specified in the streaming command line. If the configs aren't specified in either of these places, the streaming defaults (Text) no longer kick in, and the global default LongWritable is used.

        This can cause jobs/output writers that are expecting Text to fail.
        After MAPREDUCE-1888, with a java mapper or reducer, StreamJob doesn't respect stream.map.output/stream.reduce.output to set a job's output key/value classes.


        unless they are specified in the streaming command line. If the configs aren't specified in either of these places, the streaming defaults (Text) no longer kick in, and the global default LongWritable is used.

        This can cause jobs/output writers that are expecting Text to fail.
        Sandy Ryza made changes -
        Description After MAPREDUCE-1888, with a java mapper or reducer, StreamJob doesn't respect stream.map.output/stream.reduce.output to set a job's output key/value classes.


        unless they are specified in the streaming command line. If the configs aren't specified in either of these places, the streaming defaults (Text) no longer kick in, and the global default LongWritable is used.

        This can cause jobs/output writers that are expecting Text to fail.
        After MAPREDUCE-1888, with a java mapper or reducer, StreamJob doesn't respect stream.map.output/stream.reduce.output when setting a job's output key/value classes, even if these configs are explicitly set by the user.


        As MAPREDUCE-1888 is not in branch-1, this change is only needed in hadoop 2.
        Alejandro Abdelnur made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 2.0.5-beta [ 12324032 ]
        Resolution Fixed [ 1 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Sandy Ryza
            Reporter:
            Sandy Ryza
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development