Description
After the fix for TS-1962, there are some scenarios where Traffic Server can decide to reuse an origin server connection (because of keep-alive), even when the destination port is different. I encountered this with 4.0.2, though this bug is still in trunk, but it is masked by another fix.
When using Traffic Server 4.0.2 as a transparent proxy, two requests that are made to the same origin server but on different ports, may be sent to the same port.
All GETs work as expected:
- GET to origin:80 arrives at origin:80
- GET to origin:80, followed by GET to origin:9999 arrive at origin:80 and
origin:9999 respectively - GET to origin:9999, followed by GET to origin:80 arrive at origin:9999 and
origin:80 respectively
But stuff gets weird after a POST:
- POST to origin:9999, followed by GET to origin:80, arrive at origin:9999 and
origin:9999 respectively
This results in requests being sent to incorrect origin server ports, which may
result in the HTML app getting unexpected errors, or unexpected data. A workaround is to disable keep-alive, but that is of course bad for performance.
It looks like the problem is in proxy/http/HttpSM.cc, in function HttpSM::do_http_server_open():
4438 void 4439 HttpSM::do_http_server_open(bool raw) 4440 { ... 4510 if (raw == false && t_state.txn_conf->share_server_sessions && 4511 (t_state.txn_conf->keep_alive_post_out == 1 || t_state.hdr_info.request_content_length == 0) && 4512 !is_private() && ua_session != NULL) { ... 4540 else if ((!t_state.txn_conf->share_server_sessions || is_private()) && (ua_session != NULL)) { 4541 HttpServerSession *existing_ss = ua_session->get_server_session(); 4542 4543 if (existing_ss) { 4544 // [amc] Is this OK? Should we compare ports? (not done by ats_ip_addr_cmp) 4545 if (ats_ip_addr_eq(&existing_ss->server_ip.sa, &t_state.current.server->addr.sa)) { 4546 ua_session->attach_server_session(NULL); 4547 existing_ss->state = HSS_ACTIVE; 4548 this->attach_server_session(existing_ss); 4549 hsm_release_assert(server_session != NULL); 4550 handle_http_server_open(); 4551 return; 4552 } else { 4553 // As this is in the non-sharing configuration, we want to close 4554 // the existing connection and call connect_re to get a new one 4555 existing_ss->release(); 4556 ua_session->attach_server_session(NULL); 4557 } 4558 } 4559 }
When the condition in line 4540 is hit, and get_server_session() returns an existing session, the addresses of the existing session and the current one are compared (line 4545), but not the port numbers, as the comment in line 4544 indicates. This is incorrect, as an existing session to a different port on the same origin server should never be reused.
The most obvious fix is to add a comparison of the ports, e.g.:
if (ats_ip_addr_eq(&existing_ss->server_ip.sa, &t_state.current.server->addr.sa) && ats_ip_port_cast(&existing_ss->server_ip.sa) == ats_ip_port_cast(&t_state.current.server->addr)) {
I have added this locally as a patch, and with that, the scenario described at the top of this bug does not occur anymore.
Note that the issue in this particular scenario seems to be masked by the fix for TS-312. Initially, I tried reproducing with trunk, but I found it only occurs before that fix. After TS-312, the condition in line 4540 above is not hit anymore, but the first one in line 4510 is hit instead. However, there may still be ways to make the execution flow branch to line 4545, and that could still trigger the bug.