Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2186

OOM with a simple scatter gather job with re-use

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None

    Description

      With a no-op scatter gather job, 20K x 2K, on a 20 node cluster with 20 2GB containers per node - reducers end up failing with OOM errors. Haven't been able to generate a heap dump yet. Will add details as they're found.

      Attachments

        1. noopexample.txt
          6 kB
          Siddharth Seth
        2. TEZ-2186.1.patch
          4 kB
          Rajesh Balamohan
        3. TEZ-2186.2.patch
          4 kB
          Rajesh Balamohan
        4. TEZ-2186-branch-0.6.patch
          4 kB
          Rajesh Balamohan

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rajesh.balamohan Rajesh Balamohan
            sseth Siddharth Seth
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment