Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15486

Costly sendResponse operation slows down async editlog handling

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.0
    • None
    • None
    • None

    Description

      When our cluster NameNode in a very high load, we find it often stuck in Async-editlog handling.

      We use async-profile tool to get the flamegraph.

      This happened in that async editlog thread consumes Edit from the queue and triggers the sendResponse call.

      But here the sendResponse call is a little expensive since our cluster enabled the security env and will do some encode operations when doing the return response operation.

      We often catch some moments of costly sendResponse operation when rpc call queue is fulled.

      Slowness on consuming Edit in async editlog will make Edit pending Queue easily become the fulled state, then block its enqueue operation that is invoked in writeLock type methods in FSNamesystem class.

      Here the enhancement is that we can use multiple thread to parallel execute sendResponse call. sendResponse doesn't need use the write lock to do protection, so this change is safe.

      Attachments

        1. async-profile-(1).jpg
          460 kB
          Yiqun Lin
        2. Async-profile-(2).jpg
          186 kB
          Yiqun Lin
        3. HDFS-15486_draft.patch
          10 kB
          Yiqun Lin

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            linyiqun Yiqun Lin

            Dates

              Created:
              Updated:

              Slack

                Issue deployment