Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13639

SlotReleaser is not fast enough

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0, 2.6.0, 3.0.2
    • 3.3.1, 3.4.0
    • hdfs-client
    • None
    • Reviewed

    Description

      When test the performance of the ShortCircuit Read of the HDFS with YCSB, we find that SlotReleaser of the ShortCircuitCache has some performance issue. The problem is that, the qps of the slot releasing could only reach to 1000+ while the qps of the slot allocating is ~3000. This means that the replica info on datanode could not be released in time, which causes a lot of GCs and finally full GCs.

       

      The fireflame graph shows that SlotReleaser spends a lot of time to do domain socket connecting and throw/catching the exception when close the domain socket and its streams. It doesn't make any sense to do the connecting and closing each time. Each time when we connect to the domain socket, Datanode allocates a new thread to free the slot. There are a lot of initializing work, and it's costly. We need reuse the domain socket. 

       

      After switch to reuse the domain socket(see diff attached), we get great improvement(see the perf):

      1. without reusing the domain socket, the get qps of the YCSB getting worse and worse, and after about 45 mins, full GC starts. When we reuse the domain socket, no full GC found, and the stress test could be finished smoothly, the qps of allocating and releasing match.
      2. Due to the datanode young GC, without the improvement, the YCSB get qps is even smaller than the one with the improvement, ~3700 VS ~4200.

       

      Attachments

        1. HDFS-13639.001.patch
          12 kB
          Lisheng Sun
        2. HDFS-13639.002.patch
          11 kB
          Lisheng Sun
        3. HDFS-13639-2.4.diff
          12 kB
          Gang Xie
        4. perf_after_improve_SlotReleaser.png
          79 kB
          Gang Xie
        5. perf_before_improve_SlotReleaser.png
          86 kB
          Gang Xie
        6. ShortCircuitCache_new_slotReleaser.diff
          4 kB
          Gang Xie

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            leosun08 Lisheng Sun
            xiegang112 Gang Xie
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment