Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-723

Race condition exists in the method MapOutputLocation.getFile

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      There seems to be a race condition in the way the Reduces copy the map output files from the Maps. If a copier is blocked in the connect method (in the beginning of the method MapOutputLocation.getFile) to a Jetty on a Map, and the MapCopyLeaseChecker detects that the copier was idle for too long, it will go ahead and issue a interrupt (read 'kill') to this thread and create a new Copier thread. However, the copier, currently blocked trying to connect to Jetty on a Map, doesn't actually get killed until the connect timeout expires and as soon as the connect comes out (with an IOException), it will delete the map output file which actually could have been (successfully) created by the new Copier thread. This leads to the Sort phase for that reducer failing with a FileNotFoundException.
      One simple way to fix this is to not delete the file if the file was not created within this getFile method.

        Attachments

          Activity

            People

            • Assignee:
              owen.omalley Owen O'Malley
              Reporter:
              devaraj Devaraj Das
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: