Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6295

Possible performance improvement in client batch operations: presplit and send in background

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.95.2
    • 0.98.0, 0.95.2
    • Client, Performance
    • None
    • Reviewed
    • Hide
      The puts are now streamed, i.e. sent asynchronously to the region servers if autoflush it set to false. If a region server is slow or does not respond, its puts are kept into the write buffer while the others are sent to these respective region server, until the write buffer is full. This feature is keeps the semantic of the interface already existing in 0.94 when using autoflush.
      Show
      The puts are now streamed, i.e. sent asynchronously to the region servers if autoflush it set to false. If a region server is slow or does not respond, its puts are kept into the write buffer while the others are sent to these respective region server, until the write buffer is full. This feature is keeps the semantic of the interface already existing in 0.94 when using autoflush.

    Description

      today batch algo is:

      for Operation o: List<Op>{
        add o to todolist
        if todolist > maxsize or o last in list
          split todolist per location
          send split lists to region servers
          clear todolist
          wait
      }
      

      We could:

      • create immediately the final object instead of an intermediate array
      • split per location immediately
      • instead of sending when the list as a whole is full, send it when there is enough data for a single location

      It would be:

      for Operation o: List<Op>{
        get location
        add o to todo location.todolist
        if (location.todolist > maxLocationSize)
          send location.todolist to region server 
          clear location.todolist
          // don't wait, continue the loop
      }
      send remaining
      wait
      

      It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable.
      It's interesting mainly for 'big' writes

      Attachments

        1. 6295.v1.patch
          17 kB
          Nicolas Liochon
        2. 6295.v2.patch
          18 kB
          Nicolas Liochon
        3. 6295.v3.patch
          19 kB
          Nicolas Liochon
        4. 6295.v4.patch
          21 kB
          Nicolas Liochon
        5. 6295.v5.patch
          21 kB
          Nicolas Liochon
        6. 6295.v6.patch
          28 kB
          Nicolas Liochon
        7. 6295.v8.patch
          80 kB
          Nicolas Liochon
        8. 6295.v9.patch
          85 kB
          Nicolas Liochon
        9. 6295.v11.patch
          87 kB
          Nicolas Liochon
        10. 6295.v12.patch
          87 kB
          Nicolas Liochon
        11. 6295.v14.patch
          87 kB
          Nicolas Liochon
        12. 6295.v15.patch
          101 kB
          Nicolas Liochon
        13. 6295.addendum.patch
          10 kB
          Nicolas Liochon
        14. hbase-ycsb-workloads Build time trend.png
          23 kB
          Elliott Neil Clark

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nkeywal Nicolas Liochon
            nkeywal Nicolas Liochon
            Votes:
            0 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment