ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-86

intermittent test failure of org.apache.zookeeper.test.AsyncTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: tests
    • Labels:
      None
    • Environment:

      OS X and linux. It sometimes passes; but mostly seems to fail on OS X each time

      Description

      Will attach the test output in an attachment...

      1. patch_for_ZOOKEEPER-86.patch
        5 kB
        james strachan
      2. TEST-org.apache.zookeeper.test.AsyncTest.txt
        1.50 MB
        james strachan

        Issue Links

          Activity

          james strachan created issue -
          Hide
          james strachan added a comment -

          here's the output when ran on OS X (using Leopard)

          Show
          james strachan added a comment - here's the output when ran on OS X (using Leopard)
          james strachan made changes -
          Field Original Value New Value
          Attachment TEST-org.apache.zookeeper.test.AsyncTest.txt [ 12386735 ]
          Hide
          james strachan added a comment -

          about to attach

          Show
          james strachan added a comment - about to attach
          james strachan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          james strachan added a comment -

          this patch seems to fix the test case on OS X at least; I've split the test case into 2 parts (so they are forked separately) and added more delays before trying to rebind to the server socket which seems to fix the error

          Show
          james strachan added a comment - this patch seems to fix the test case on OS X at least; I've split the test case into 2 parts (so they are forked separately) and added more delays before trying to rebind to the server socket which seems to fix the error
          james strachan made changes -
          Attachment patch_for_ZOOKEEPER-86.patch [ 12386748 ]
          Hide
          james strachan added a comment -

          BTW I have sometimes still seen the AsyncHammerTest fail on OS X still; the basic issue is the restart of the quorum servers - its often the 3rd one - the server socket has not yet been released by the OS which tends to cause the failure. While things seem to work much better now, we might wanna add a bigger sleep in between restarts if it starts getting more common again

          Show
          james strachan added a comment - BTW I have sometimes still seen the AsyncHammerTest fail on OS X still; the basic issue is the restart of the quorum servers - its often the 3rd one - the server socket has not yet been released by the OS which tends to cause the failure. While things seem to work much better now, we might wanna add a bigger sleep in between restarts if it starts getting more common again
          Patrick Hunt made changes -
          Assignee james strachan [ jstrachan ]
          Hide
          Patrick Hunt added a comment -

          The exceptions in the log like this:

          2008-07-23 17:57:15,449 - WARN [SendThread:ClientCnxn$SendThread@726] - Closing:
          java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
          at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:491)
          at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:712)

          are the same issue as ZOOKEEPER-63 - the client has asked the server to close the connection but hasn't noted this fact (read returns -1), then when the server closes the client complains.

          I don't like adding delays since it results in the unit tests taking forever (they already take a lot more time than they should , almost all the time is due to doing sleeps). IMO tests should run very quickly so that we're more likely to run them.

          We really need a better way of handling this - see ZOOKEEPER-61 which already captures this issue with excessive/unnecessary sleep.

          -1 on this patch until the two issues 61/63, are addressed and we can be certain of successful fix

          It would be great if you could tackle this test "harness" issue. There are at least 3 jira (86/61/63) related to this. Hudson has intermittent failures as well. Feel free to collapse these 3 bugs into 1 jira if it makes sense to have a single patch for all of them. (or "link" them together and submit a patch against one)

          Show
          Patrick Hunt added a comment - The exceptions in the log like this: 2008-07-23 17:57:15,449 - WARN [SendThread:ClientCnxn$SendThread@726] - Closing: java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer [pos=0 lim=4 cap=4] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:491) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:712) are the same issue as ZOOKEEPER-63 - the client has asked the server to close the connection but hasn't noted this fact (read returns -1), then when the server closes the client complains. I don't like adding delays since it results in the unit tests taking forever (they already take a lot more time than they should , almost all the time is due to doing sleeps). IMO tests should run very quickly so that we're more likely to run them. We really need a better way of handling this - see ZOOKEEPER-61 which already captures this issue with excessive/unnecessary sleep. -1 on this patch until the two issues 61/63, are addressed and we can be certain of successful fix It would be great if you could tackle this test "harness" issue. There are at least 3 jira (86/61/63) related to this. Hudson has intermittent failures as well. Feel free to collapse these 3 bugs into 1 jira if it makes sense to have a single patch for all of them. (or "link" them together and submit a patch against one)
          Patrick Hunt made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Patrick Hunt made changes -
          Link This issue relates to ZOOKEEPER-111 [ ZOOKEEPER-111 ]
          Hide
          Patrick Hunt added a comment -

          The patch for ZOOKEEPER-111 may address these issues. There may still be some timing issues to resolve also the close bug still exists in ZOOKEEPER-63

          Show
          Patrick Hunt added a comment - The patch for ZOOKEEPER-111 may address these issues. There may still be some timing issues to resolve also the close bug still exists in ZOOKEEPER-63
          Patrick Hunt made changes -
          Link This issue relates to ZOOKEEPER-63 [ ZOOKEEPER-63 ]
          Mahadev konar made changes -
          Fix Version/s 3.3.0 [ 12313976 ]
          Hide
          Mahadev konar added a comment -

          closing this issue. this does not happen any longer.

          Show
          Mahadev konar added a comment - closing this issue. this does not happen any longer.
          Mahadev konar made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Patrick Hunt made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              james strachan
              Reporter:
              james strachan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development