Hadoop Common
  1. Hadoop Common
  2. HADOOP-3104

MultithreadMapRunner keeps consuming records even if trheads are not available

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.16.1
    • Fix Version/s: 0.16.2
    • Component/s: None
    • Labels:
      None
    • Environment:

      all

      Description

      The ExecutorService execute() method does not block when there are not threads available, it queues up the runnables until there are threads.

      The problem is that all key/values are read and kept in memory for the task, with large datasets this will create a OOM exception.

      Have to figure out how to use the execute in blocking fashion.

      1. patch3104.txt
        7 kB
        Alejandro Abdelnur
      2. patch3104.txt
        5 kB
        Alejandro Abdelnur
      3. patch3104-2.txt
        7 kB
        Owen O'Malley
      4. patch3104-3.txt
        7 kB
        Owen O'Malley

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #445 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/445/ )
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378815/patch3104-3.txt
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 3 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac -1. The applied patch generated 568 javac compiler warnings (more than the trunk's current 567 warnings).

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378815/patch3104-3.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 568 javac compiler warnings (more than the trunk's current 567 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2089/console This message is automatically generated.
        Hide
        Owen O'Malley added a comment -

        I just committed this. Thanks, Alejandro!

        Show
        Owen O'Malley added a comment - I just committed this. Thanks, Alejandro!
        Hide
        Chris Douglas added a comment -

        +1 Looks good

        Show
        Chris Douglas added a comment - +1 Looks good
        Hide
        Owen O'Malley added a comment -

        This patch overloads offer as well as add in addition to the previous one.

        Show
        Owen O'Malley added a comment - This patch overloads offer as well as add in addition to the previous one.
        Hide
        Nigel Daley added a comment -

        Hudson is down right now. The machine is getting new memory.

        In the interest of time, I ran the test-patch target on this. Here's the output:

        -1 overall.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 3 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac -1. The applied patch generated 568 javac compiler warnings (more than the trunk's current 567 warnings).

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        The javac issue is that a new private class does not have a serialVersionUID which is acceptable.

        Show
        Nigel Daley added a comment - Hudson is down right now. The machine is getting new memory. In the interest of time, I ran the test-patch target on this. Here's the output: -1 overall. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 568 javac compiler warnings (more than the trunk's current 567 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. The javac issue is that a new private class does not have a serialVersionUID which is acceptable.
        Hide
        Chris Douglas added a comment -

        Internally, ThreadPoolExecutor uses offer() rather than add() when the queue is saturated. With that minor change, +1

        Show
        Chris Douglas added a comment - Internally, ThreadPoolExecutor uses offer() rather than add() when the queue is saturated. With that minor change, +1
        Hide
        Owen O'Malley added a comment -

        I replaced the submitJob, followed by polling with runJob. I also replaced the BlockingTaskExecutor with a TaskExecutor that uses a really blocking queue.

        Show
        Owen O'Malley added a comment - I replaced the submitJob, followed by polling with runJob. I also replaced the BlockingTaskExecutor with a TaskExecutor that uses a really blocking queue.
        Hide
        Devaraj Das added a comment -

        +1

        Show
        Devaraj Das added a comment - +1
        Hide
        Devaraj Das added a comment -

        Shouldn't it log it continue to busy wait?

        This should be okay given the context is a task execution and not a daemon execution

        Show
        Devaraj Das added a comment - Shouldn't it log it continue to busy wait? This should be okay given the context is a task execution and not a daemon execution
        Hide
        Amar Kamat added a comment -

        Instead of treating InterruptedException as a noop, it would be better to throw an IOException with the InterruptedException as its cause.

        Shouldn't it log it continue to busy wait?

        Show
        Amar Kamat added a comment - Instead of treating InterruptedException as a noop, it would be better to throw an IOException with the InterruptedException as its cause. Shouldn't it log it continue to busy wait?
        Hide
        Alejandro Abdelnur added a comment -

        per Chris suggestion, catch blocks of InterruptedException now rethrow the exception, but as a RuntimeException (instead an IOException as he suggested) as I think is more appropriate as it is not an IO issue.

        per Devaraj's suggestion, refactored to use wait-notify in a ThreadPoolExecutor subclass that uses the wait-notify to make the execute blocking if there are not threads. There is no need to use a rejection handler anymore. Also refactor the exception check into a method and use that method in the different parts of the maprunner instead duplicating the exception check code.

        Show
        Alejandro Abdelnur added a comment - per Chris suggestion, catch blocks of InterruptedException now rethrow the exception, but as a RuntimeException (instead an IOException as he suggested) as I think is more appropriate as it is not an IO issue. per Devaraj's suggestion, refactored to use wait-notify in a ThreadPoolExecutor subclass that uses the wait-notify to make the execute blocking if there are not threads. There is no need to use a rejection handler anymore. Also refactor the exception check into a method and use that method in the different parts of the maprunner instead duplicating the exception check code.
        Hide
        Devaraj Das added a comment -

        It'd be better to have wait-notify mechanism instead of sleep.

        Show
        Devaraj Das added a comment - It'd be better to have wait-notify mechanism instead of sleep.
        Hide
        Amar Kamat added a comment -

        Here are some comments
        1) The javadoc comments should not mention the default value. That might change and will require code change too. So you can keep the earlier comment as is and just add the comment about the wait parameter.
        2) I think mapred.map.multithreadedrunner.backoff seems more appropriate than mapred.map.multithreadedrunner.waitwhennothreads, comments?
        3) 10ms seems too short. I was wondering what if we double it everytime. Something like 10,20,40,80 ...

        Show
        Amar Kamat added a comment - Here are some comments 1) The javadoc comments should not mention the default value. That might change and will require code change too. So you can keep the earlier comment as is and just add the comment about the wait parameter. 2) I think mapred.map.multithreadedrunner.backoff seems more appropriate than mapred.map.multithreadedrunner.waitwhennothreads, comments? 3) 10ms seems too short. I was wondering what if we double it everytime. Something like 10,20,40,80 ...
        Hide
        Chris Douglas added a comment -
        • Instead of treating InterruptedException as a noop, it would be better to throw an IOException with the InterruptedException as its cause.
        • Does the wait between attempts need to be configurable?
        Show
        Chris Douglas added a comment - Instead of treating InterruptedException as a noop, it would be better to throw an IOException with the InterruptedException as its cause. Does the wait between attempts need to be configurable?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378727/patch3104.txt
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc -1. The javadoc tool appears to have generated 1 warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378727/patch3104.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2079/console This message is automatically generated.
        Hide
        Alejandro Abdelnur added a comment -

        figured out how to use properly the ThreadPoolExecutor to block execute invocations (instead queuing them) until there are trheads avail.

        Show
        Alejandro Abdelnur added a comment - figured out how to use properly the ThreadPoolExecutor to block execute invocations (instead queuing them) until there are trheads avail.

          People

          • Assignee:
            Alejandro Abdelnur
            Reporter:
            Alejandro Abdelnur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development