Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-542

on-the-fly merge sort, HADOOP-540, reformat

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None
    • Tested on Linux and Windows

    Description

      A large patch for streaming. Changes:

      Support for on-the-fly merge sort of multiple map input files.
      This supposes that the inputs are already sorted.

      Support for reducer-NONE side-effects to a single local output with DFS inputs.
      This can be used to do an on-the-fly merge-sort of remote sorted files.
      (Compare to DFSShell -getmerge which does catenation of remote sorted files)
      The single output can be a regular file, a named pipe or a socket.
      URI Syntax: -mapsideoutput file:/C:/win

      Add an optional JUnit test for on-the-fly merge-sort.
      It requires Unix tools. It also works with cygwin.

      If it has been more than 10 secs since last time we did this:
      call reporter.setStatus() when consuming a stderr line from the Application.
      Calling setStatus with reducer-NONE was already done as part of HADOOP-413.
      So overall this resolves HADOOP-540.

      Reformat streaming code to conform to Hadoop conventions
      (indent 2 spaces, opening bracket on same-line)

      Attachments

        1. bigmux.patch2
          195 kB
          Michel Tourn

        Activity

          People

            Unassigned Unassigned
            michel_tourn Michel Tourn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: