Traffic Server
  1. Traffic Server
  2. TS-1158

Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.3
    • Fix Version/s: 3.1.4, 3.0.5
    • Component/s: Core
    • Labels:
      None
    • Environment:

      ALL

    • Backport to Version:

      Description

      Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired. Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

      1. ts-1158-jp1.patch
        0.8 kB
        John Plevyak

        Activity

        Hide
        weijin added a comment -

        I see the read_from_net and write_to_net_io function also have such mechanism to prevent the race condition. I read and read it again, but still can not figure out how the mutex is switched. Can you explain it more detailly, and I also want to know what is consequences of the race. thanks vvvvery much.

        Show
        weijin added a comment - I see the read_from_net and write_to_net_io function also have such mechanism to prevent the race condition. I read and read it again, but still can not figure out how the mutex is switched. Can you explain it more detailly, and I also want to know what is consequences of the race. thanks vvvvery much.
        Hide
        John Plevyak added a comment -

        The mutex switch occurs in the HttpSessionManager. When a session is passed to it, the read.vio.mutex and write.vio.mutex from the old controlling HttpSM are replaced with that of a hash bucket of sessions in the Manager (a hash to reduce contention on this globally shared data structure). When a session is requested from the HttpSessionManager, they are replaced with those of the new HttpSM which will be using that OS connection. During the swap, the previous and new mutexes are held, but nevertheless, a race is possible if a thread grabs the old (pre substitution) mutex, then a context switch occurs and the mutexes are swapped and the old mutex (pre substitute) lock is released, then the first thread resumes, locks the (pre substitution) mutex and now two threads are running while thinking they are holding the mutex for the NetVC. The solution is to ensure, after the lock has been taken, that the mutex we have locked is the same one that is protecting the NetVC. If it is not, we back out and retry later.

        Show
        John Plevyak added a comment - The mutex switch occurs in the HttpSessionManager. When a session is passed to it, the read.vio.mutex and write.vio.mutex from the old controlling HttpSM are replaced with that of a hash bucket of sessions in the Manager (a hash to reduce contention on this globally shared data structure). When a session is requested from the HttpSessionManager, they are replaced with those of the new HttpSM which will be using that OS connection. During the swap, the previous and new mutexes are held, but nevertheless, a race is possible if a thread grabs the old (pre substitution) mutex, then a context switch occurs and the mutexes are swapped and the old mutex (pre substitute) lock is released, then the first thread resumes, locks the (pre substitution) mutex and now two threads are running while thinking they are holding the mutex for the NetVC. The solution is to ensure, after the lock has been taken, that the mutex we have locked is the same one that is protecting the NetVC. If it is not, we back out and retry later.
        Hide
        John Plevyak added a comment -

        Note that when replacing a mutex, both the new and old mutexes must be held. Also note that this protection (double checking) is only provided in the NetProcessor as it is the only Processor whose VC mutexes are switched. Any virtualization would need to provide the same protection.

        Show
        John Plevyak added a comment - Note that when replacing a mutex, both the new and old mutexes must be held. Also note that this protection (double checking) is only provided in the NetProcessor as it is the only Processor whose VC mutexes are switched. Any virtualization would need to provide the same protection.
        Hide
        taorui added a comment -

        excellent, thanks again.

        On Wed, 2012-03-21 at 14:59 +0000, John Plevyak (Commented) (JIRA)

        Show
        taorui added a comment - excellent, thanks again. On Wed, 2012-03-21 at 14:59 +0000, John Plevyak (Commented) (JIRA)
        Hide
        taorui added a comment -

        I am afraid the race is (may be one of) the root cause of TS-857, but I
        am not sure.

        On Wed, 2012-03-21 at 14:53 +0000, John Plevyak (Commented) (JIRA)

        Show
        taorui added a comment - I am afraid the race is (may be one of) the root cause of TS-857 , but I am not sure. On Wed, 2012-03-21 at 14:53 +0000, John Plevyak (Commented) (JIRA)
        Hide
        John Plevyak added a comment -

        I am not sure either, hence the new jira issue.

        Show
        John Plevyak added a comment - I am not sure either, hence the new jira issue.
        Hide
        Brian Geffon added a comment -

        Backported to 3.0.x in commit 6f8c3d33f005fa114b339b90d00932ef099a59d7

        Show
        Brian Geffon added a comment - Backported to 3.0.x in commit 6f8c3d33f005fa114b339b90d00932ef099a59d7

          People

          • Assignee:
            Brian Geffon
            Reporter:
            John Plevyak
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development