Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1834

TestSimulatorDeterministicReplay timesout on trunk

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.22.0
    • Component/s: contrib/mumak
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. TestSimulatorDeterministicReplay.log
      1.30 MB
      Amareshwari Sriramadasu
    2. mr-1834-20100802.patch
      7 kB
      Hong Tang
    3. mr-1834-20100729.patch
      6 kB
      Hong Tang
    4. mr-1834-20100727.patch
      6 kB
      Hong Tang
    5. MAPREDUCE-1834.patch
      1 kB
      Hemanth Yamijala

      Activity

      Hide
      Amareshwari Sriramadasu added a comment -

      Test timeout log.

      Show
      Amareshwari Sriramadasu added a comment - Test timeout log.
      Hide
      Amareshwari Sriramadasu added a comment -

      Test times out in branch 0.21 also.

      Show
      Amareshwari Sriramadasu added a comment - Test times out in branch 0.21 also.
      Hide
      Hemanth Yamijala added a comment -

      I spent some time looking at this problem. The timeout was occurring most likely because of the usage of Process.waitFor(), which has been problematic in cases where the launched process writes to the error / output streams. Note that diff will spew out differences when they do exist, and so Process.waitFor() hangs. The standard pattern in Hadoop has been to replace this with ShellCommandExecutor.execute(). The attached patch has this modification. Unfortunately, the test fails now everytime.

      As far as I can understand, the test is launching two identical runs of mapreduce jobs and is trying to diff the output history directories to make sure they have identical content. But history operations can happen asynchronously. When I ran the test, I could see all sorts of differences between the directories being diff'ed. Contents and file sizes of history files were differing, as well as which files were moved to the DONE folder were differing. I think the operation of moving to DONE folder is certainly asynchronous. I don't know if some buffering of history data is happening that is causing the history files to have differing contents as well.

      Given all this I doubt if this test case will ever pass. I don't know enough about the intent though to fix it. Anyone can help ?

      Show
      Hemanth Yamijala added a comment - I spent some time looking at this problem. The timeout was occurring most likely because of the usage of Process.waitFor(), which has been problematic in cases where the launched process writes to the error / output streams. Note that diff will spew out differences when they do exist, and so Process.waitFor() hangs. The standard pattern in Hadoop has been to replace this with ShellCommandExecutor.execute(). The attached patch has this modification. Unfortunately, the test fails now everytime. As far as I can understand, the test is launching two identical runs of mapreduce jobs and is trying to diff the output history directories to make sure they have identical content. But history operations can happen asynchronously. When I ran the test, I could see all sorts of differences between the directories being diff'ed. Contents and file sizes of history files were differing, as well as which files were moved to the DONE folder were differing. I think the operation of moving to DONE folder is certainly asynchronous. I don't know if some buffering of history data is happening that is causing the history files to have differing contents as well. Given all this I doubt if this test case will ever pass. I don't know enough about the intent though to fix it. Anyone can help ?
      Hide
      Hong Tang added a comment -

      Just being pointed to this jira while I was working with anirban on MAPREDUCE-1253. The testcase launches "diff -r" and hangs when the output buffer is full. I changed the command to "diff -r -q" and it fails fast.

      I am taking a look at the problem now.

      Show
      Hong Tang added a comment - Just being pointed to this jira while I was working with anirban on MAPREDUCE-1253 . The testcase launches "diff -r" and hangs when the output buffer is full. I changed the command to "diff -r -q" and it fails fast. I am taking a look at the problem now.
      Hide
      Hong Tang added a comment -

      I should mention that the test was introduced as part of MAPREDUCE-1306, and I just verified that the test passed right after that commit.

      Show
      Hong Tang added a comment - I should mention that the test was introduced as part of MAPREDUCE-1306 , and I just verified that the test passed right after that commit.
      Hide
      Hong Tang added a comment -

      With "git bisect" (and a lot of manual hacks), I pin-pointed to the source of problem: MAPREDUCE-1372. Apparently, we need to intercept ConcurrentHashMap and change it to a LinkedHashMap in src/contrib/mumak/src/java/org/apache/hadoop/mapred/DeterministicCollectionAspects.aj.

      Show
      Hong Tang added a comment - With "git bisect" (and a lot of manual hacks), I pin-pointed to the source of problem: MAPREDUCE-1372 . Apparently, we need to intercept ConcurrentHashMap and change it to a LinkedHashMap in src/contrib/mumak/src/java/org/apache/hadoop/mapred/DeterministicCollectionAspects.aj.
      Hide
      Hong Tang added a comment -

      It is a bit tricky to intercept the constructor of ConcurrentHashMap because there is no ConcurrentLinkedHashMap. I implemented a faked concurrent hash map which does not support concurrency (ok because mumak runs as a single thread) and uses a LinkedHashMap for internal storage.

      Show
      Hong Tang added a comment - It is a bit tricky to intercept the constructor of ConcurrentHashMap because there is no ConcurrentLinkedHashMap. I implemented a faked concurrent hash map which does not support concurrency (ok because mumak runs as a single thread) and uses a LinkedHashMap for internal storage.
      Hide
      Amareshwari Sriramadasu added a comment -

      Resubmitting the patch for hudson

      Show
      Amareshwari Sriramadasu added a comment - Resubmitting the patch for hudson
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12450573/mr-1834-20100727.patch
      against trunk revision 980316.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      -1 findbugs. The patch appears to introduce 2 new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450573/mr-1834-20100727.patch against trunk revision 980316. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/338/console This message is automatically generated.
      Hide
      Hong Tang added a comment -

      patch fixing findbugs warnings.

      Show
      Hong Tang added a comment - patch fixing findbugs warnings.
      Hide
      Amareshwari Sriramadasu added a comment -

      submitting for hudson

      Show
      Amareshwari Sriramadasu added a comment - submitting for hudson
      Hide
      Hong Tang added a comment -

      Test-patch results on my local machine.

           [exec] +1 overall.
           [exec]
           [exec]     +1 @author.  The patch does not contain any @author tags.
           [exec]
           [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
           [exec]
           [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
           [exec]
           [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
           [exec]
           [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
           [exec]
           [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
      
      Show
      Hong Tang added a comment - Test-patch results on my local machine. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
      Hide
      Tamas Sarlos added a comment -

      Hong, thanks for finding the root cause. The patch looks ok to me by large, but there are 1 major, and 3 minor issues with it.

      major:

      • FakedConcurrentHashmap.remove(Object key, Object value) does not remove the (key,value) pair.

      minor:

      • The test used to pass if diff is not available, eg. on Windows (Runtime.exec threw an IOException). Now it fails due to the added Assert.fail in the catch block.
        Could you reflect this decision by changing the comments as well or adjust the code to keep skipping the test on these platforms?
      • Replacing the above Runtime.exec with ShellCommandExecutor too would make the code look more uniform.
      • I suggest adding some comments to FakedConcurrentHashMap.java on the need and use of that class as described in this JIRA.

      Amareshwari & Hemanth, thanks for catching the issue with Process.wait().

      Show
      Tamas Sarlos added a comment - Hong, thanks for finding the root cause. The patch looks ok to me by large, but there are 1 major, and 3 minor issues with it. major: FakedConcurrentHashmap.remove(Object key, Object value) does not remove the (key,value) pair. minor: The test used to pass if diff is not available, eg. on Windows (Runtime.exec threw an IOException). Now it fails due to the added Assert.fail in the catch block. Could you reflect this decision by changing the comments as well or adjust the code to keep skipping the test on these platforms? Replacing the above Runtime.exec with ShellCommandExecutor too would make the code look more uniform. I suggest adding some comments to FakedConcurrentHashMap.java on the need and use of that class as described in this JIRA. Amareshwari & Hemanth, thanks for catching the issue with Process.wait().
      Hide
      Hong Tang added a comment -

      @tamas, thanks for the review comments. Will upload a patch soon.

      Show
      Hong Tang added a comment - @tamas, thanks for the review comments. Will upload a patch soon.
      Hide
      Hong Tang added a comment -

      New patch addresses Tamas's review comments.

      Show
      Hong Tang added a comment - New patch addresses Tamas's review comments.
      Hide
      Hong Tang added a comment -

      Unit tets for mumak and test-patch passes on my local machine.

           [exec] +1 overall.  
           [exec] 
           [exec]     +1 @author.  The patch does not contain any @author tags.
           [exec] 
           [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
           [exec] 
           [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
           [exec] 
           [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
           [exec] 
           [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
           [exec] 
           [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
      
      test:
           [echo] contrib: mumak
         [delete] Deleting directory /Users/htang/Documents/Work/workspace/hadoop-mapreduce/build/contrib/mumak/test/logs
          [mkdir] Created dir: /Users/htang/Documents/Work/workspace/hadoop-mapreduce/build/contrib/mumak/test/logs
          [junit] WARNING: multiple versions of ant detected in path for junit 
          [junit]          jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
          [junit]      and jar:file:/Users/htang/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
          [junit] Running org.apache.hadoop.mapred.TestRemoveIpsFromLoggedNetworkTopology
          [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.599 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorDeterministicReplay
          [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 34.976 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorEndToEnd
          [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 116.936 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorEngine
          [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.122 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorEventQueue
          [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.041 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorJobClient
          [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.097 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorJobTracker
          [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.231 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorSerialJobSubmission
          [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 208.964 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorStressJobSubmission
          [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 112.813 sec
          [junit] Running org.apache.hadoop.mapred.TestSimulatorTaskTracker
          [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.857 sec
      
      Show
      Hong Tang added a comment - Unit tets for mumak and test-patch passes on my local machine. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. test: [echo] contrib: mumak [delete] Deleting directory /Users/htang/Documents/Work/workspace/hadoop-mapreduce/build/contrib/mumak/test/logs [mkdir] Created dir: /Users/htang/Documents/Work/workspace/hadoop-mapreduce/build/contrib/mumak/test/logs [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/Users/htang/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.mapred.TestRemoveIpsFromLoggedNetworkTopology [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.599 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorDeterministicReplay [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 34.976 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorEndToEnd [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 116.936 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorEngine [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.122 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorEventQueue [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.041 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorJobClient [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.097 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorJobTracker [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.231 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorSerialJobSubmission [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 208.964 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorStressJobSubmission [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 112.813 sec [junit] Running org.apache.hadoop.mapred.TestSimulatorTaskTracker [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.857 sec
      Hide
      Tamas Sarlos added a comment -

      +1 on the revised patch.

      Show
      Tamas Sarlos added a comment - +1 on the revised patch.
      Hide
      Mahadev konar added a comment -

      +1 ... ill go ahead and commit the patch.

      Show
      Mahadev konar added a comment - +1 ... ill go ahead and commit the patch.
      Hide
      Mahadev konar added a comment -

      I just committed this. Thanks hong.

      Show
      Mahadev konar added a comment - I just committed this. Thanks hong.
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

      Show
      Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
      Hide
      arunkumar added a comment -

      I did svn checkout of common,hdfs and mapreduce from http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.21.0/
      I tried to build three of them and BUILD WAS SUCCESSFUL for all three.
      From /mapreduce/src/contrib/mumak/ i tried $ant and $ant test.

      The former was successful but the later says timeout and showed failure at :

      Running org.apache.hadoop.mapred.TestSimulatorDeterministicReplay
      Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 30.647 sec

      I tried to apply latest patch given here but it gives :

      patching file DeterministicCollectionAspects.aj
      Hunk #1 FAILED at 21.
      Hunk #2 FAILED at 42.
      2 out of 2 hunks FAILED – saving rejects to file DeterministicCollectionAspects.aj.rej
      patching file FakeConcurrentHashMap.java
      patching file TestSimulatorDeterministicReplay.java
      Hunk #2 succeeded at 26 with fuzz 2 (offset 1 line).
      Hunk #3 succeeded at 50 (offset 1 line).

      Actually,In svn checkout there are no DeterministicCollectionAspects.aj and FakeConcurrentHashMap.java files.
      I don't know enough about to fix it. Any help ?

      Show
      arunkumar added a comment - I did svn checkout of common,hdfs and mapreduce from http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.21.0/ I tried to build three of them and BUILD WAS SUCCESSFUL for all three. From /mapreduce/src/contrib/mumak/ i tried $ant and $ant test. The former was successful but the later says timeout and showed failure at : Running org.apache.hadoop.mapred.TestSimulatorDeterministicReplay Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 30.647 sec I tried to apply latest patch given here but it gives : patching file DeterministicCollectionAspects.aj Hunk #1 FAILED at 21. Hunk #2 FAILED at 42. 2 out of 2 hunks FAILED – saving rejects to file DeterministicCollectionAspects.aj.rej patching file FakeConcurrentHashMap.java patching file TestSimulatorDeterministicReplay.java Hunk #2 succeeded at 26 with fuzz 2 (offset 1 line). Hunk #3 succeeded at 50 (offset 1 line). Actually,In svn checkout there are no DeterministicCollectionAspects.aj and FakeConcurrentHashMap.java files. I don't know enough about to fix it. Any help ?
      Hide
      arunkumar added a comment -

      More info ..
      From svn checkout in HADOOP_HOME/mapreduce/src/contrib/mumak/src/test/org/apache/hadoop/mapred/
      there are no DeterministicCollectionAspects.aj and FakeConcurrentHashMap.java files.

      Show
      arunkumar added a comment - More info .. From svn checkout in HADOOP_HOME/mapreduce/src/contrib/mumak/src/test/org/apache/hadoop/mapred/ there are no DeterministicCollectionAspects.aj and FakeConcurrentHashMap.java files.

        People

        • Assignee:
          Hong Tang
          Reporter:
          Amareshwari Sriramadasu
        • Votes:
          0 Vote for this issue
          Watchers:
          3 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development