Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7704

DN heartbeat to Active NN may be blocked and expire if connection to Standby NN continues to time out.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.5.0
    • 2.7.0
    • datanode, namenode
    • None
    • Reviewed

    Description

      There are couple of synchronous calls in BPOfferservice (i.e reportBadBlocks and trySendErrorReport) which will wait for both of the actor threads to process this calls.
      This calls are made with writeLock acquired.
      When reportBadBlocks() is blocked at the RPC layer due to unreachable NN, subsequent heartbeat response processing has to wait for the write lock. It eventually gets through, but takes too long and it blocks the next heartbeat.
      In our HA cluster setup, the standby namenode was taking a long time to process the request.
      Requesting improvement in datanode to make the above calls asynchronous since these reports don't have any specific
      deadlines, so extra few seconds of delay should be acceptable.

      Attachments

        1. HDFS-7704.patch
          14 kB
          Rushabh Shah
        2. HDFS-7704-v2.patch
          18 kB
          Rushabh Shah
        3. HDFS-7704-v3.patch
          18 kB
          Rushabh Shah
        4. HDFS-7704-v4.patch
          18 kB
          Rushabh Shah
        5. HDFS-7704-v5.patch
          24 kB
          Rushabh Shah

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shahrs87 Rushabh Shah
            shahrs87 Rushabh Shah
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment