Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-141

Disk thrashing / task timeouts during map output copy phase

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.3.0
    • None
    • None
    • linux

    Description

      MapOutputProtocol connections cause timeouts because of system thrashing and transferring the same file over and over again, ultimately leading to making no forward progress(medium sized job, 500GB input file, map output about as large as the input, 10 node cluster).

      There are several bugs behind this, but the following two changes improved matters considerably.

      (1)

      The buffersize in MapOutputFile is currently hardcoded to 8192 bytes (for both reads and writes). By changing this buffer size to 256KB, the number of disk seeks are reduced and the problem went away.

      Ideally there would be a buffer size parameter for this that is separate from the DFS io buffer size.

      (2)

      I also added the following code to the socket configuration in both Server.java and Client.java. No linger is a minor good idea in an enivronment with some packet loss (and you will have that when all the nodes get busy at once), but 256KB buffers is probably excessive, especially on a LAN, but it takes me two hours to test changes so I havent experimented.

      socket.setSendBufferSize(256*1024);
      socket.setReceiveBufferSize(256*1024);
      socket.setSoLinger(false, 0);
      socket.setKeepAlive(true);

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            omalley Owen O'Malley Assign to me
            psutter Peter Sutter
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment