Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12639

BPOfferService lock may stall all service actors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.8.0
    • None
    • datanode
    • None

    Description

      BPOfferService manages BPServiceActor instances for the active and standby. It uses a RW lock to primarily protect registration information while determining the active/standby from heartbeats.

      Unfortunately the write lock is held during command processing. If an actor is experiencing high latency processing commands, the other actor will neither be able to register (blocked in createRegistration, setNamespaceInfo, verifyAndSetNamespaceInfo) nor process heartbeats (blocked in updateActorStatesFromHeartbeat).

      The worst case scenario for processing commands while holding the lock is re-registration. The actor will loop, catching and logging exceptions, leaving the other actor blocked for an non-deterministic (possibly infinite) amount of time.

      The lock must not be held during command processing.

      Attachments

        Issue Links

          Activity

            People

              hanishakoneru Hanisha Koneru
              daryn Daryn Sharp
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: