Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-902 Improvements to thread safety in Samza
  3. SAMZA-896

Improvements to thread-safety in ContainerRequestState

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The ContainerRequestState class is currently not thread-safe. The class's methods and state variables are called from both the ContainerAllocator and the AMRMCallback handler.

      Here's an analysis I summarized by looking at the current implementation. From the below table, getContainersOnAHost() returns the entire list of containers to the AbstractContainerAllocator which reads it. However, the Callback thread invokes methods that write to the allocatedContainerQueue.

      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | Method                      | requestQueue | hostsToCountMap | allocatedContainerQueue | ThreadsAccesedFrom  |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | updateRequestState          | write        | write/read      | write                   | Callback, Allocator |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | addContainer                |              | read            | write                   | Callback            |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | updateStateAfterAssignment  | write        | write           | read                    | Allocator           |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | releaseExtraContainers      | write        | write           | write                   | Allocator           |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | releaseUnstartableContainer |              |                 |                         | Allocator           |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | getContainersOnAHost        |              |                 | read                    | Allocator           |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      | getRequestsQueue            | read         |                 |                         | Allocator           |
      +-----------------------------+--------------+-----------------+-------------------------+---------------------+
      
      

      I'm proposing to make the ContainerRequestState class thread-safe by:
      i) Returning a copy of the list (if callers need the entire list - which they should not) for getContainersOnAHost, getRequestsQueue
      ii) Expose more granular synchronized APIs for things like:
      1. Getting num containers on a host.
      2. Getting the number of pending requests.
      3. Checking if the queue is empty etc. / peeking a queue etc.

      The other methods are already synchronized.

      This improvement is beneficial because:
      1. This allows us to reason about concurrency more effectively when using the request-state in multi-threaded contexts.
      2. It's cleaner for the ContainerRequestState class to expose higher level methods and not expose the guts of the class in its entirety.

      Attachments

        1. SAMZA-896.0.patch
          22 kB
          Jagadish

        Issue Links

          Activity

            People

              jagadish1989@gmail.com Jagadish
              jagadish1989@gmail.com Jagadish
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: