Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
jrushford and I were looking at some code, and found this piece which looks suspicious:
} else if (s->current.attempts < s->txn_conf->parent_connect_attempts) { s->current.attempts++; // Are we done with this particular parent? if ((s->current.attempts - 1) % s->http_config_param->per_parent_connect_attempts != 0) { // No we are not done with this parent so retry s->next_action = how_to_open_connection(s); DebugTxn("http_trans", "%s Retrying parent for attempt %d, max %" PRId64, "[handle_response_from_parent]", s->current.attempts, s->http_config_param->per_parent_connect_attempts); return; } else { DebugTxn("http_trans", "%s %d per parent attempts exhausted", "[handle_response_from_parent]", s->current.attempts); // Only mark the parent down if we failed to connect // to the parent otherwise slow origin servers cause // us to mark the parent down if (s->current.state == CONNECTION_ERROR) { s->parent_params->markParentDown(&s->parent_result); } // We are done so look for another parent if any next_lookup = find_server_and_update_current_info(s); } } else { // Done trying parents... fail over to origin server if that is // appropriate DebugTxn("http_trans", "[handle_response_from_parent] Error. No more retries."); s->parent_params->markParentDown(&s->parent_result); s->parent_result.result = PARENT_FAIL; next_lookup = find_server_and_update_current_info(s); }
Here's my thinking: It seems that if we exhaust parent_connect_attempts (which is proxy.config.http.parent_proxy.total_connect_attempts, default 4), we end up always marking whatever the current parent is as potentially down.
This seems wrong, because it might do this even if the number of tries against that particular parent has not been exhausted (that is the proxy.config.http.parent_proxy.per_parent_connect_attempts setting, tested inside the loop).
Looking at this, it feels that we should either remove that markParentDown() entirely, or at a minimum add the same checks as above, i.e.
- s->parent_params->markParentDown(&s->parent_result);
+ if (s->current.state == CONNECTION_ERROR) {
+ s->parent_params->markParentDown(&s->parent_result);
+ }
I'm still doing some more tests, one concern is that it's unclear if the earlier comment is correct, and that we can detect the difference between a connection failure to parent, vs an origin server response to the parent is just being slow (resulting in a timeout and no response to child).
Attachments
Issue Links
- links to