Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-814

SegmentMerger bug

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1
    • 1.1
    • None
    • None

    Description

      Dennis reported:

      In the SegmentMerger.java file about line 150 we have this:

      final SequenceFile.Reader reader =
      new SequenceFile.Reader(FileSystem.get(job), fSplit.getPath(),
      job);

      Then about line 166 in the record reader we have this:

      boolean res = reader.next(key, w);

      If I am reading that right, that would mean that the map tap would loop
      over all records for a given file and not just a given split.

      Right, this should instead use SequenceFileRecordReader that already has the logic to handle splits. Patch coming shortly - thanks for spotting this! This could be the reason for "out of disk space" errors that many users reported.

      Attachments

        1. merger.patch
          6 kB
          Andrzej Bialecki

        Activity

          People

            ab Andrzej Bialecki
            musepwizard Dennis Kubes
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: