Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4946

Type conversion of map completion events leads to performance problems with large jobs

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.0.2-alpha, 0.23.5
    • 2.0.3-alpha, 0.23.7
    • mr-am
    • None

    Description

      We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers fail to connect back to the AM after being launched due to connection timeout. Looking at stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock to get the application ID while type converting the map completion events. What's odd is that normally getting the application ID should be very cheap, but in this case we're type-converting thousands of map completion events for each reducer connecting. That means we end up type-converting the map completion events over 45 million times during the lifetime of the example job (13,000 * 3,500).

      We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write locked) or, even better, store the completion events in a form that does not require type conversion when serving them up to reducers.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jlowe Jason Darrell Lowe Assign to me
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment