Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-2481

Incorrect origin server port used sometimes (with keep-alive)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.2.0
    • HTTP
    • None

    Description

      After the fix for TS-1962, there are some scenarios where Traffic Server can decide to reuse an origin server connection (because of keep-alive), even when the destination port is different. I encountered this with 4.0.2, though this bug is still in trunk, but it is masked by another fix.

      When using Traffic Server 4.0.2 as a transparent proxy, two requests that are made to the same origin server but on different ports, may be sent to the same port.

      All GETs work as expected:

      • GET to origin:80 arrives at origin:80
      • GET to origin:80, followed by GET to origin:9999 arrive at origin:80 and
        origin:9999 respectively
      • GET to origin:9999, followed by GET to origin:80 arrive at origin:9999 and
        origin:80 respectively

      But stuff gets weird after a POST:

      • POST to origin:9999, followed by GET to origin:80, arrive at origin:9999 and
        origin:9999 respectively

      This results in requests being sent to incorrect origin server ports, which may
      result in the HTML app getting unexpected errors, or unexpected data. A workaround is to disable keep-alive, but that is of course bad for performance.

      It looks like the problem is in proxy/http/HttpSM.cc, in function HttpSM::do_http_server_open():

        4438  void
        4439  HttpSM::do_http_server_open(bool raw)
        4440  {
      ...
        4510    if (raw == false && t_state.txn_conf->share_server_sessions &&
        4511        (t_state.txn_conf->keep_alive_post_out == 1 || t_state.hdr_info.request_content_length == 0) &&
        4512         !is_private() && ua_session != NULL) {
      ...
        4540    else if ((!t_state.txn_conf->share_server_sessions || is_private()) && (ua_session != NULL)) {
        4541      HttpServerSession *existing_ss = ua_session->get_server_session();
        4542
        4543      if (existing_ss) {
        4544        // [amc] Is this OK? Should we compare ports? (not done by ats_ip_addr_cmp)
        4545        if (ats_ip_addr_eq(&existing_ss->server_ip.sa, &t_state.current.server->addr.sa)) {
        4546          ua_session->attach_server_session(NULL);
        4547          existing_ss->state = HSS_ACTIVE;
        4548          this->attach_server_session(existing_ss);
        4549          hsm_release_assert(server_session != NULL);
        4550          handle_http_server_open();
        4551          return;
        4552        } else {
        4553          // As this is in the non-sharing configuration, we want to close
        4554          // the existing connection and call connect_re to get a new one
        4555          existing_ss->release();
        4556          ua_session->attach_server_session(NULL);
        4557        }
        4558      }
        4559    }
      

      When the condition in line 4540 is hit, and get_server_session() returns an existing session, the addresses of the existing session and the current one are compared (line 4545), but not the port numbers, as the comment in line 4544 indicates. This is incorrect, as an existing session to a different port on the same origin server should never be reused.

      The most obvious fix is to add a comparison of the ports, e.g.:

            if (ats_ip_addr_eq(&existing_ss->server_ip.sa, &t_state.current.server->addr.sa) &&
                ats_ip_port_cast(&existing_ss->server_ip.sa) == ats_ip_port_cast(&t_state.current.server->addr)) {
      

      I have added this locally as a patch, and with that, the scenario described at the top of this bug does not occur anymore.

      Note that the issue in this particular scenario seems to be masked by the fix for TS-312. Initially, I tried reproducing with trunk, but I found it only occurs before that fix. After TS-312, the condition in line 4540 above is not hit anymore, but the first one in line 4510 is hit instead. However, there may still be ways to make the execution flow branch to line 4545, and that could still trigger the bug.

      Attachments

        1. fix-ts-2481-2.diff
          0.8 kB
          Dimitry Andric
        2. fix-ts-2481-1.diff
          0.8 kB
          Dimitry Andric

        Activity

          People

            zwoop Leif Hedstrom
            dim Dimitry Andric
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: