Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1284

clean up the protocol between stream mapper/reducer and the framework

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Right now, the protocol between stream mapper/reducer and the framework is very inflexible.
      The mapper/reducer generates line oriented output. The framework picks up line by line, and split
      each line into a key/value pair. By default, the substring up to the first tab char is the key, and the
      substring after the first tab char is the value.

      However, in many cases, the application wants some control over how the pair is split.
      Here, I'd like to introduce the following configuration variables for that:

      1. "streaming.output.field.separator": the value will be the tab key, by default.
      But the user can specify a different one (e.g. ':', or ', ', etc.)
      A map output line can be considered as a list of fields separated by the separator.

      2. "streaming.num.fields.for.mapout.key": the number of the first fields will be used the map output key
      (and for sorting in the reduce side).
      The default value is 1.
      The rest of the fields will be used as the value. For example, I can specify the first 5 fields as my mapout key.

      3. "streaming.num.fields.for.partitioning": Sometimes, I want to use fewer fields for partitioning to
      achieve "primary/secondary" composite
      key effect as proposed in HADOOP485. The default value is 1.
      For example, I can set "streaming.num.fields.for.partitioning" to 3
      and "streaming.num.fields.for.mapout.key" to 5.
      This effectively amounts to saying that fields 4 and 5 are my secondary key.

      With the above default values, it is compatible with the current behavior
      while introducing a new desirable feature in a clean way.

      Thoughts?

        Attachments

        1. patch-1284.txt
          35 kB
          Runping Qi

          Activity

            People

            • Assignee:
              runping Runping Qi
              Reporter:
              runping Runping Qi
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: