Description
3 issues:
In zoo_add_auth: there is a race condition:
2940 // ZOOKEEPER-800 zoo_add_auth should return ZINVALIDSTATE if
2941 // the connection is closed.
2942 if (zoo_state(zh) == 0)
when we do zookeeper_init, the state is initialized to 0 and above we check if state = 0 then throw exception.
There is a race condition where the doIo thread is slow and has not changed the state to CONNECTING, then you end up returning back ZKINVALIDSTATE.
The problem is we use 0 for CLOSED state and UNINITIALIZED state. in case of uninitialized case it should let it go through.
2nd issue:
Another Bug: in send_auth_info, the check is not correct
while (auth->next != NULL) { //--BUG: in cases where there is only one auth in the list, this will never send that auth, as its next will be NULL
rc = send_info_packet(zh, auth);
auth = auth->next;
}
FIX IS:
do
while (auth != NULL); //this will make sure that even if there is one auth ,that will get sent.
3rd issue:
2965 add_last_auth(&zh->auth_h, authinfo);
2966 zoo_unlock_auth(zh);
2967
2968 if(zh->state == ZOO_CONNECTED_STATE || zh->state == ZOO_ASSOCIATING_STATE)
2969 return send_last_auth_info(zh);
if it is connected, we only send the last_auth_info, which may be different than the one we added, as we unlocked it before sending it.
Attachments
Attachments
- ZOOKEEPER-1108.patch
- 4 kB
- Dheeraj Agrawal
- ZOOKEEPER-1108.patch
- 3 kB
- Dheeraj Agrawal
- ZOOKEEPER-1108.patch
- 3 kB
- Dheeraj Agrawal
- ZOOKEEPER-1108.patch
- 3 kB
- Dheeraj Agrawal
- ZOOKEEPER-1108.patch
- 0.6 kB
- Dheeraj Agrawal
Activity
another bug
303 static void mark_active_auth(zhandle_t *zh) {
304 auth_list_head_t auth_h = zh->auth_h;
305 auth_info *element;
306 if (auth_h.auth == NULL)
309 element = auth_h.auth;
310 while (element->next != NULL)
314 }
315
Here as well we are skipping the first node in the list and in case of N nodes , we will skip processing the last one.
It should be changed to{no format}
310 while (element != NULL)
{ {no format}Not sure if this is a bug or behavior:
When we get a auth response, every time we process any auth_response, we call all the auth completions (might be registered by different add_auth_info calls). Is that intended? Or should we be calling only the one that the request came from? I guess we dont know for which request the response corresponds to? If the requests are processed in FIFO and response are got in order then may be we can figure out which add_auth info request the response corresponds to.
We never remove entries from the auth_list, is that intended?
Also the logging is misleading.
1193 if (rc)
{ 1194 LOG_ERROR(("Authentication scheme %s failed. Connection closed.", 1195 zh->auth_h.auth->scheme)); 1196 }1197 else
{ 1198 LOG_INFO(("Authentication scheme %s succeeded", zh->auth_h.auth->scheme)); 1199 }If there are multiple auth_info in the auth_list , we always print success/failure for ONLY the first one. So if i had two auths for scehmes, ABCD and EFGH and my auth scheme EFGH failed, the logs will still say ABCD failed.
Attached patch for 2 bugs listed in here, where the list traversal is not correct.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12484681/ZOOKEEPER-1108.patch
against trunk revision 1140017.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/358//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/358//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/358//console
This message is automatically generated.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12484681/ZOOKEEPER-1108.patch
against trunk revision 1144087.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/385//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/385//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/385//console
This message is automatically generated.
I can.. what do you recommend. we can check in this 2 bug fixes and resolve this and i can open another tracker for outstanding issues.
Opened following 2 trackers for the outstanding issues.
https://issues.apache.org/jira/browse/ZOOKEEPER-1126
https://issues.apache.org/jira/browse/ZOOKEEPER-1127
The attached patch is for issue: where the auth requests (first one) are not sent to server if the client has not yet connected to server (and state has not yet transitioned into ZOO_CONNECTING) , due to bug in send_auth_info. to write a test case for this, i had to fix ZOOKEEPER-1126 first, as without that fix, zoo_add_auth will always return ZKINVALIDSTATE.
I will update and close ZOOKEEPER-1126 (as its patch is also available in this).
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12486924/ZOOKEEPER-1108.patch
against trunk revision 1146961.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 6 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/400//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/400//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/400//console
This message is automatically generated.
Mahadev/Michi, is there anything remaining / pending for me in this patch to be able to be checked in?
We would like this patch to be accepted and included in 3.4
Also a note regarding the tests, All tests pass when i run it locally using gcc-3.4.6 and gcc-3.2.3. Not sure why it complained here.
Dheeraj, Looks like some really bad failure in the tests. Can you take a look and see?
[exec] Zookeeper_watchers::testChildWatcher2 : elapsed 103 : OK [exec] [exec] *** glibc detected *** ./zktest-mt: malloc(): memory corruption: 0x00002affc7a83010 *** [exec] [exec] ======= Backtrace: ========= [exec] [exec] OK (57) [exec] [exec] /lib/libc.so.6[0x2affc88c701f]
Here is the link to console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/400/console
Yeah, i tired to run them locally and i didn't see any issues. Let me rerun with a clean checkout. I will update the thread soon.
I did a fresh checkout and ran the tests through ant and all tests pass.
[exec] Zookeeper_watchers::testObjectSessionWatcher2 : elapsed 63 : OK
[exec] Zookeeper_watchers::testNodeWatcher1 : elapsed 60 : OK
[exec] Zookeeper_watchers::testChildWatcher1 : elapsed 107 : OK
[exec] Zookeeper_watchers::testChildWatcher2 : elapsed 109 : OK
[exec] OK (56)
BUILD SUCCESSFUL
Can you re-trigger the build/test for this patch? I tried to trigger it by re-uploading the patch, but that didn't help.
Dheeraj,
The pre commit build servers are down. We are waiting for them to come up so that we can get back on track with 3.4 release.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12489102/ZOOKEEPER-1108.patch
against trunk revision 1152141.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 6 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/438//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/438//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/438//console
This message is automatically generated.
Looks like memory corruption in the tests.
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/438/console
Dheeraj, can you please take a look?
Added a zk_close to the new tests, to see if it makes any difference
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12490123/ZOOKEEPER-1108.patch
against trunk revision 1156649.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 6 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/452//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/452//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/452//console
This message is automatically generated.
Mahadev, can you review the patch and check it in , if all looks good to you? All tests are passing now without any issues.
Dheeraj,
Looks like you added, the new state of NOT_CONNECTED_STATE in the header definitions. That probably needs a little more work, since the java client will need to have similar changes. I'd say we get rid of the new state in this patch! Makes sense?
If we get rid of the new state, then we will have to get rid of the tests as well. As we cant test it without having the new state, as zk_init will return INVALID state in that case.
This new state is not publicly exposed.
We already have a different state in java (NULL) . In C, we are using 0 for both CLOSED state and NOT_CONNECTED state. In java, NOT_CONNECTED state is null.
Mahadev, what do you say? We would like to get this into 3.4 since it causes us a lot of grief with the auth in the C client. I think Dheeraj's point is good, and the Java and C clients are already divergent enough in this manner that this shouldn't be an issue.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12491680/ZOOKEEPER-1108.patch
against trunk revision 1159929.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 6 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12491680/ZOOKEEPER-1108.patch
against trunk revision 1165443.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 6 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/519//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/519//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/519//console
This message is automatically generated.
Integrated in ZooKeeper-trunk #1299 (See https://builds.apache.org/job/ZooKeeper-trunk/1299/)
ZOOKEEPER-1108. Various bugs in zoo_add_auth in C. (Dheeraj Agrawal via mahadev)
mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166970
Files :
- /zookeeper/trunk/CHANGES.txt
- /zookeeper/trunk/src/c/src/zk_adaptor.h
- /zookeeper/trunk/src/c/src/zookeeper.c
- /zookeeper/trunk/src/c/tests/TestClient.cc
- /zookeeper/trunk/src/c/tests/TestZookeeperInit.cc
- /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java
- /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java
Another Bug: in send_auth_info, the check is not correct
while (auth->next != NULL)
{ //--BUG: in cases where there is only one auth in the list, this will never send that auth, as its next will be NULL rc = send_info_packet(zh, auth); auth = auth->next; }FIX IS:
{ rc = send_info_packet(zh, auth); auth = auth->next; }do
while (auth != NULL); //this will make sure that even if there is one auth ,that will get sent.