Linux ubuntu 4.4.0-87-generic
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
ZooKeeper C Client build
When I call zookeeper_close() while there is a pending multi request, I expect the request completes with ZCLOSING (-116) status.
But with the existing code I actually get the following:
- the program exits with SIGABRT from assert(entry) in deserialize_multi()
- and even if I remove this assertion and just break the enclosing loop, the returned status is ZOK but not ZCLOSING
So, there are two defects with processing calls to zookeeper_close() for pending multi requests: improper assertion in implementation and invalid status in confirmation.
1. Screen assert(entry) in deserialize_multi() when it's called due to zookeeper_close() (note that entry may normally become equal to NULL in this case), and as soon as entry == NULL, break the loop. To provide this, deserialize_multi() must be informed by the caller weather it's currently the "normal" or the "special" case.
I propose adding a new parameter rc_hint (return code hint) to deserialize_multi(). When deserialize_multi() is called in "normal" case, rc_hint is preset with ZOK (0), and the behavior is absolutely the same as with the existing code. But when it's called due to zookeeper_close(), the rc_hint is automatically preset with ZCLOSING (-116) by the caller, and this changes the behavior of deserialize_multi() as described above.
How it works:
Let zookeeper_close() is called while there is a pending multi request. Then function deserialize_multi() is called for the so-called "Fake response" on multi request which is fabricated by the function free_completions(). Such fake response includes only the header but zero bytes for the body. Due to this deserialize_MultiHeader(ia, "multiheader", &mhdr), which is called repeatedly for each completion_list_t *entry = dequeue_completion(clist), does not assign the mhdr and keeps mhdr.done == 0 as it was originally initialized. Consequently the while (!mhdr.done) loop does not ever end, and finally falls into the assert(entry) with entry == NULL when all fake sub-requests are "completed".
But if, as I propose, the caller made a hint to deserialize_multi() that it's actually the "special" case (that it processes the fake response indeed, for example), with the proposed changes it would omit improper assertion and break the loop on the first entry == NULL. Now at least deserialize_multi() exits and does not emit SIGABRT.
2. Passthrough the "return code hint" rc_hint, as it was initially specified by the caller, to the deserialize_multi() return code, if the hint is not ZOK (0).
How it works:
With the existing code deserialize_multi() returns unsuccessful rc-code only if there is an error in processing some of subrequests. And if there are no errors, it returns ZOK (0) which is assigned as the default value to rc at the very beginning of the function. Indeed, in the case of fake multi-response there are no errors in subresponses (because they are empty and fake). So, deserialize_multi() returns ZOK (0). Then, with rc = deserialize_multi(xid, cptr, ia) in deserialize_response() it overrides the true ZCLOSING status.
But if the true status (for example, ZCLOSING) is initially hinted to deserialize_multi(), as I propose, deserialize_multi() would reproduce it back instead of irrelevant ZOK (0). And consequently deserialize_response() would finally report the true status (particularly ZCLOSING).
It looks like about the same problem is described in
However, the patch proposed in this ticket also remedies the second linked problem: reporting ZCLOSING status (as required) to the multi-request completion handler.