Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-874

merge code is really slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.10.0
    • None
    • io
    • None

    Description

      I had a case where the map output buffer size (io.sort.mb) was set too low and caused a spill and merge. Fixing the configuration caused it to not spill until it was finished. With the spill it took 9.5 minutes per a map. Without the spill it took 45 seconds. Therefore, I assume it was taking ~9 minutes to do the 2 file merge. That is really slow. The input files to the merge were two 25 mb sequence files (default codec (java), block compressed)

      Attachments

        1. merge-no-seek.patch
          5 kB
          Devaraj Das

        Activity

          People

            ddas Devaraj Das
            omalley Owen O'Malley
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: