Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-444

In streaming with a NONE reducer, you get duplicate files if a mapper fails, is restarted, and succeeds next time.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.0
    • 0.7.0
    • None
    • None

    Description

      When the dust settled after a streaming run, the directory ended up looking like this:

      /user/dking/<project-name>/K-HTML-UTF8-2006-08-09-rescued-abstracted/task_0026_m_007384_0 <r 3> 10563406
      /user/dking/<project-name>/K-HTML-UTF8-2006-08-09-rescued-abstracted/task_0026_m_007384_1 <r 3> 10563406

      Future processing will receive duplicated data.

      -dk

      Attachments

        Activity

          People

            michel_tourn Michel Tourn
            dking Dick King
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: