Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-444

In streaming with a NONE reducer, you get duplicate files if a mapper fails, is restarted, and succeeds next time.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      When the dust settled after a streaming run, the directory ended up looking like this:

      /user/dking/<project-name>/K-HTML-UTF8-2006-08-09-rescued-abstracted/task_0026_m_007384_0 <r 3> 10563406
      /user/dking/<project-name>/K-HTML-UTF8-2006-08-09-rescued-abstracted/task_0026_m_007384_1 <r 3> 10563406

      Future processing will receive duplicated data.

      -dk

        Attachments

          Activity

            People

            • Assignee:
              michel_tourn Michel Tourn
              Reporter:
              dking Dick King
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: