Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3916

DrainDispatcher#await should wait till event has been completely handled

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 2.7.0
    • None
    • None
    • None

    Description

      DrainDispatcher#await should wait till event has been completely handled.
      Currently it only checks for whether event queue has become empty.

      And in many tests we directly check for a state to be changed after calling await.
      Sometimes, the states do not change by the time we check them as event has not been completely handled.

      This is causing test failures such as YARN-3909 and YARN-3910 and may cause other test failures as well.

      Attachments

        1. YARN-3916.01.patch
          3 kB
          Varun Saxena
        2. YARN-3916.02.patch
          3 kB
          Varun Saxena

        Issue Links

          Activity

            varun_saxena Varun Saxena added a comment -

            In YARN-3878 we introduced check for whether event queue is empty in DrainDispatcher#await.
            Even pre YARN-3878, code was essentially doing the same but that was through a volatile flag. That may have failed sometimes as well.
            But changes to volatile flag were not seen by other thread as quickly as checking for event queue being empty hence few of these tests were not failing and allowing async dispatcher to handle the event.

            We should ideally check whether event has been handled in addition to event queue being empty.

            varun_saxena Varun Saxena added a comment - In YARN-3878 we introduced check for whether event queue is empty in DrainDispatcher#await. Even pre YARN-3878 , code was essentially doing the same but that was through a volatile flag. That may have failed sometimes as well. But changes to volatile flag were not seen by other thread as quickly as checking for event queue being empty hence few of these tests were not failing and allowing async dispatcher to handle the event. We should ideally check whether event has been handled in addition to event queue being empty.
            varun_saxena Varun Saxena added a comment -

            Pre YARN-3878 code would have actually worked for DrainDispatcher.

            varun_saxena Varun Saxena added a comment - Pre YARN-3878 code would have actually worked for DrainDispatcher.
            varun_saxena Varun Saxena added a comment -

            Updated patch

            varun_saxena Varun Saxena added a comment - Updated patch
            hadoopqa Hadoop QA added a comment -



            +1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 16m 12s Pre-patch trunk compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
            +1 javac 7m 41s There were no new javac warning messages.
            +1 javadoc 9m 41s There were no new javadoc warning messages.
            +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
            +1 checkstyle 0m 56s There were no new checkstyle issues.
            +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            +1 install 1m 21s mvn install still works.
            +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
            +1 findbugs 1m 34s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.
                40m 19s  



            Subsystem Report/Notes
            Patch URL http://issues.apache.org/jira/secure/attachment/12744910/YARN-3916.01.patch
            Optional Tests javadoc javac unit findbugs checkstyle
            git revision trunk / 1df39c1
            hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8511/artifact/patchprocess/testrun_hadoop-yarn-common.txt
            Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8511/testReport/
            Java 1.7.0_55
            uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Console output https://builds.apache.org/job/PreCommit-YARN-Build/8511/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 12s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 41s There were no new javac warning messages. +1 javadoc 9m 41s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 56s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 21s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 34s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.     40m 19s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12744910/YARN-3916.01.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 1df39c1 hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8511/artifact/patchprocess/testrun_hadoop-yarn-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8511/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8511/console This message was automatically generated.
            leftnoteasy Wangda Tan added a comment -

            Thanks varun_saxena investigating this problem, I think patch generally looks good. I suggest to add some comments about why add a processingEvents variable.

            leftnoteasy Wangda Tan added a comment - Thanks varun_saxena investigating this problem, I think patch generally looks good. I suggest to add some comments about why add a processingEvents variable.
            varun_saxena Varun Saxena added a comment -

            Thanks leftnoteasy for reviewing the patch.
            Have added a comment as you said and updated a new patch.

            varun_saxena Varun Saxena added a comment - Thanks leftnoteasy for reviewing the patch. Have added a comment as you said and updated a new patch.
            hadoopqa Hadoop QA added a comment -



            +1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 16m 13s Pre-patch trunk compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
            +1 javac 7m 42s There were no new javac warning messages.
            +1 javadoc 9m 35s There were no new javadoc warning messages.
            +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
            +1 checkstyle 0m 56s There were no new checkstyle issues.
            +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            +1 install 1m 21s mvn install still works.
            +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
            +1 findbugs 1m 34s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.
                40m 16s  



            Subsystem Report/Notes
            Patch URL http://issues.apache.org/jira/secure/attachment/12744944/YARN-3916.02.patch
            Optional Tests javadoc javac unit findbugs checkstyle
            git revision trunk / d7319de
            hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8513/artifact/patchprocess/testrun_hadoop-yarn-common.txt
            Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8513/testReport/
            Java 1.7.0_55
            uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Console output https://builds.apache.org/job/PreCommit-YARN-Build/8513/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 13s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 42s There were no new javac warning messages. +1 javadoc 9m 35s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 56s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 21s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 34s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.     40m 16s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12744944/YARN-3916.02.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / d7319de hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8513/artifact/patchprocess/testrun_hadoop-yarn-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8513/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8513/console This message was automatically generated.
            leftnoteasy Wangda Tan added a comment -

            varun_saxena,
            Thanks for updating, but I think the comment may not accurate enough if I understand correctly: the flag == true when dispatcher is processing events OR dispatcher has pending events, maybe rename it to something like "isIdle" is more accurate.

            Thoughts?

            leftnoteasy Wangda Tan added a comment - varun_saxena , Thanks for updating, but I think the comment may not accurate enough if I understand correctly: the flag == true when dispatcher is processing events OR dispatcher has pending events, maybe rename it to something like "isIdle" is more accurate. Thoughts?

            Skimmed through the patch. Would it be simpler to have a boolean flag (eventBeingProcessed) turned on only when an event is being handled, and isProcessingEvents be !eventQueue.isEmpty() || eventBeingProcessed?

            kasha Karthik Kambatla added a comment - Skimmed through the patch. Would it be simpler to have a boolean flag ( eventBeingProcessed ) turned on only when an event is being handled, and isProcessingEvents be !eventQueue.isEmpty() || eventBeingProcessed ?

            How about keeping earlier variable drained only, this makes more appropriate I think.

            rohithsharma Rohith Sharma K S added a comment - How about keeping earlier variable drained only, this makes more appropriate I think.
            varun_saxena Varun Saxena added a comment -

            I thought of this first.
            But issue here is that LinkedBlockingQueue#take is a blocking queue. It wont return from the call if there are no pending events.
            If I set the flag before this call, it is possible that we are stuck forever there.
            If I set it afterwards there can be a minor race wherein event queue may have taken from queue and made it empty but not updated the flag to true. In this case, we may wrongly judge that no events are being processed.

            varun_saxena Varun Saxena added a comment - I thought of this first. But issue here is that LinkedBlockingQueue#take is a blocking queue. It wont return from the call if there are no pending events. If I set the flag before this call, it is possible that we are stuck forever there. If I set it afterwards there can be a minor race wherein event queue may have taken from queue and made it empty but not updated the flag to true. In this case, we may wrongly judge that no events are being processed.
            varun_saxena Varun Saxena added a comment -

            Yeah variable can be named as drained. Because we name the dispatcher as DrainDispatcher as well.
            I can mention in comments as to what it means.

            varun_saxena Varun Saxena added a comment - Yeah variable can be named as drained. Because we name the dispatcher as DrainDispatcher as well. I can mention in comments as to what it means.
            jianhe Jian He added a comment -

            Won't draining events on serviceStop have the same problem ?

            Actually wondered why that flag was added in the first place, this should be the reason.
            I think we can revert the change of YARN-3878 and fix the problem of YARN-3878 properly ?

            jianhe Jian He added a comment - Won't draining events on serviceStop have the same problem ? Actually wondered why that flag was added in the first place, this should be the reason. I think we can revert the change of YARN-3878 and fix the problem of YARN-3878 properly ?
            varun_saxena Varun Saxena added a comment -

            Won't draining events on serviceStop have the same problem ?

            Yes, you are correct. It will have the same problem. We may judge that queue is empty and go on to interrupt the thread while event is being handled.

            I think we can revert the change of YARN-3878 and fix the problem of YARN-3878 properly ?

            Yeah lets do that. I think when Interrupted exception is thrown on put(issue in YARN-3878), we can reset the flag to false if queue is empty.
            Thoughts ?

            varun_saxena Varun Saxena added a comment - Won't draining events on serviceStop have the same problem ? Yes, you are correct. It will have the same problem. We may judge that queue is empty and go on to interrupt the thread while event is being handled. I think we can revert the change of YARN-3878 and fix the problem of YARN-3878 properly ? Yeah lets do that. I think when Interrupted exception is thrown on put(issue in YARN-3878 ), we can reset the flag to false if queue is empty. Thoughts ?
            varun_saxena Varun Saxena added a comment -

            jianhe, I have an added and addendum patch on YARN-3878.
            It adds the previous drained flag, reset it on InterruptedException and kept the bits related to YARN-3878 which were required.

            varun_saxena Varun Saxena added a comment - jianhe , I have an added and addendum patch on YARN-3878 . It adds the previous drained flag, reset it on InterruptedException and kept the bits related to YARN-3878 which were required.
            jianhe Jian He added a comment -

            thanks varun_saxena ! re-open YARN-3878 and close this as a dup of that.
            will look at YARN-3878.

            jianhe Jian He added a comment - thanks varun_saxena ! re-open YARN-3878 and close this as a dup of that. will look at YARN-3878 .

            People

              varun_saxena Varun Saxena
              varun_saxena Varun Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: