Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4933

MR1 final merge asks for length of file it just wrote before flushing it

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.1
    • Fix Version/s: 1.2.0
    • Component/s: mrv1, task
    • Labels:
      None

      Description

      createKVIterator in ReduceTask contains the following code:

      
                try {
                  Merger.writeFile(rIter, writer, reporter, job);
                  addToMapOutputFilesOnDisk(fs.getFileStatus(outputPath));
                } catch (Exception e) {
                  if (null != outputPath) {
                    fs.delete(outputPath, true);
                  }
                  throw new IOException("Final merge failed", e);
                } finally {
                  if (null != writer) {
                    writer.close();
                  }
                }
      

      Merger#writeFile() does not close the file after writing it, so when fs.getFileStatus() is called on it, it may not return the correct length. This causes bad accounting further down the line, which can lead to map output data being lost.

        Activity

        Hide
        Robert Joseph Evans added a comment -

        Can we really lose map data? If so this is a Blocker not a Major.

        Show
        Robert Joseph Evans added a comment - Can we really lose map data? If so this is a Blocker not a Major.
        Hide
        Sandy Ryza added a comment -

        If I understand how things work, yes. The length is used to calculate onDiskBytes. If onDiskBytes is 0, (which can happen falsely without the patch), the following code won't get called:

                final int numInMemSegments = memDiskSegments.size();
                diskSegments.addAll(0, memDiskSegments);
                memDiskSegments.clear();
                RawKeyValueIterator diskMerge = Merger.merge(
                    job, fs, keyClass, valueClass, codec, diskSegments,
                    ioSortFactor, numInMemSegments, tmpDir, comparator,
                    reporter, false, spilledRecordsCounter, null);
                diskSegments.clear();
                if (0 == finalSegments.size()) {
                  return diskMerge;
                }
                finalSegments.add(new Segment<K,V>(
                      new RawKVIteratorReader(diskMerge, onDiskBytes), true));
        

        which if I understand correctly means that some on-disk data won't be incorporated in the final merge.

        Show
        Sandy Ryza added a comment - If I understand how things work, yes. The length is used to calculate onDiskBytes. If onDiskBytes is 0, (which can happen falsely without the patch), the following code won't get called: final int numInMemSegments = memDiskSegments.size(); diskSegments.addAll(0, memDiskSegments); memDiskSegments.clear(); RawKeyValueIterator diskMerge = Merger.merge( job, fs, keyClass, valueClass, codec, diskSegments, ioSortFactor, numInMemSegments, tmpDir, comparator, reporter, false , spilledRecordsCounter, null ); diskSegments.clear(); if (0 == finalSegments.size()) { return diskMerge; } finalSegments.add( new Segment<K,V>( new RawKVIteratorReader(diskMerge, onDiskBytes), true )); which if I understand correctly means that some on-disk data won't be incorporated in the final merge.
        Hide
        Sandy Ryza added a comment -

        The background to coming across this was trying to apply MAPREDUCE-2264, which caused a bunch test failures for me due to missing map data. MAPREDUCE-2264 annotates the FileStatus with extra data, but doesn't change the timing of writer.close()/getFileStatus(). The code has been around since 2008, and must be working, so I would not be surprised if the map data loss is a false alarm, but at least for MAPREDUCE-2264's sake, it should be fixed nonetheless.

        Show
        Sandy Ryza added a comment - The background to coming across this was trying to apply MAPREDUCE-2264 , which caused a bunch test failures for me due to missing map data. MAPREDUCE-2264 annotates the FileStatus with extra data, but doesn't change the timing of writer.close()/getFileStatus(). The code has been around since 2008, and must be working, so I would not be surprised if the map data loss is a false alarm, but at least for MAPREDUCE-2264 's sake, it should be fixed nonetheless.
        Hide
        Alejandro Abdelnur added a comment -

        The patch looks right to me even if it is not losing data in practice. Bobby, would you agree?

        Show
        Alejandro Abdelnur added a comment - The patch looks right to me even if it is not losing data in practice. Bobby, would you agree?
        Hide
        Robert Joseph Evans added a comment -

        I agree that the patch looks correct, and should go in either way. I just wanted to know how critical this is, and it looks critical.

        Show
        Robert Joseph Evans added a comment - I agree that the patch looks correct, and should go in either way. I just wanted to know how critical this is, and it looks critical.
        Hide
        Robert Joseph Evans added a comment -

        Oh by the way +1.

        I am happy to check this in, but I just added Matt Foley so he can say better exactly where he wants us to check this into. For the mean time I am just going to put it into branch-1 (1.2.0-SNAPSHOT).

        Show
        Robert Joseph Evans added a comment - Oh by the way +1. I am happy to check this in, but I just added Matt Foley so he can say better exactly where he wants us to check this into. For the mean time I am just going to put it into branch-1 (1.2.0-SNAPSHOT).
        Hide
        Alejandro Abdelnur added a comment -

        great, thx. +1, I'll commit it in a few.

        Show
        Alejandro Abdelnur added a comment - great, thx. +1, I'll commit it in a few.
        Hide
        Alejandro Abdelnur added a comment -

        Thx Bobby, please go ahead and commit it then.

        Show
        Alejandro Abdelnur added a comment - Thx Bobby, please go ahead and commit it then.
        Hide
        Robert Joseph Evans added a comment -

        OK I have dug into this a bit more and I am not as scared as I was previously. I could definitely be wrong so if someone could confirm my analysis on this I would appreciate it.

        So there are two places where we could get issues. The first is because we get the file size after writing and without closing the file. This is not a huge issue because that size is only used to approximate how much space will be needed for the final merge file, which we just wrote out so who cares.

        The second case is a race between writing the merge data out to a file and reading it back in. This will only happen on the final merge when the reducer has too much buffered in memory and we need to spill to disk (we exceeded mapred.job.reduce.input.buffer.percent). But even then it becomes a race between java's buffered IO getting pushed out to disk and another part of the code reading it back in. I have a hard time believing that we are going to lose that race ever. i think we can move the back to major, sorry about scaring everyone.

        Show
        Robert Joseph Evans added a comment - OK I have dug into this a bit more and I am not as scared as I was previously. I could definitely be wrong so if someone could confirm my analysis on this I would appreciate it. So there are two places where we could get issues. The first is because we get the file size after writing and without closing the file. This is not a huge issue because that size is only used to approximate how much space will be needed for the final merge file, which we just wrote out so who cares. The second case is a race between writing the merge data out to a file and reading it back in. This will only happen on the final merge when the reducer has too much buffered in memory and we need to spill to disk (we exceeded mapred.job.reduce.input.buffer.percent). But even then it becomes a race between java's buffered IO getting pushed out to disk and another part of the code reading it back in. I have a hard time believing that we are going to lose that race ever. i think we can move the back to major, sorry about scaring everyone.
        Hide
        Arun C Murthy added a comment -

        Sandy - have you seen this in branch-2?

        Show
        Arun C Murthy added a comment - Sandy - have you seen this in branch-2?
        Hide
        Sandy Ryza added a comment -

        It's not a problem in branch-2. The corresponding code, in MergeManager#finalMerge, does the correct thing, i.e. doesn't ask for the file's length before closing it.

        Show
        Sandy Ryza added a comment - It's not a problem in branch-2. The corresponding code, in MergeManager#finalMerge, does the correct thing, i.e. doesn't ask for the file's length before closing it.
        Hide
        Steve Loughran added a comment -

        When using a blobstore as the destination -such as the s3:// filesystem, none of the data is written until the stream is closed -the result of the getFileStatus() call could even be a file not found.

        Show
        Steve Loughran added a comment - When using a blobstore as the destination -such as the s3:// filesystem, none of the data is written until the stream is closed -the result of the getFileStatus() call could even be a file not found.
        Hide
        Matt Foley added a comment -

        added 1.2.0 to fixversion, per CHANGES.txt and above comments.

        Show
        Matt Foley added a comment - added 1.2.0 to fixversion, per CHANGES.txt and above comments.
        Hide
        Matt Foley added a comment -

        Closed upon release of Hadoop 1.2.0.

        Show
        Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

          People

          • Assignee:
            Sandy Ryza
            Reporter:
            Sandy Ryza
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development