Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: None
    • Labels:
      None

      Description

      As mutations written to a tablet server its buffered and once this buffer exceeds a certain size the data is dumped to the walog and then inserted into an in memory sorted map. These walog buffers are per a client and the max size is determined by tserver.mutation.queue.max.

      Accumulo 1.5 and 1.6 call hsync() in hadoop 2 which ensures data is flushed to disk. This introduces a fixed delay when flushing walog buffers. The smaller tserver.mutation.queue.max is, the more frequently the walog buffers are flushed. With many clients writing to a tserver, this is not much of a concern because all of their walog buffers are flushed using group commit. This results in high throughput because large batches of data being written before hsync is called. However if a few client writing to a tserver there will be a lot more calls to hsync. It would be nice the # of calls to hsync was a function of the amount of data written regardless of the number of concurrent clients. Currently as the number of concurrent clients goes down, the number of calls to hsync goes up.

      In 1.6 and 1.5 this can be mitigated by increasing tserver.mutation.queue.max, however this is multiplied by the number of concurrent writers. So increasing it can improve performance of a single writer but increases the chances of many concurrent writers exhausting memory.

        Issue Links

          Activity

          Hide
          elserj Josh Elser added a comment -

          It looks like this actually changed the per-write-thread caches to be global for the tserver? If that is the case, I think we need some updated documentation too given how much we called out tserver.mutation.queue.max in the release notes and elsewhere.

          Show
          elserj Josh Elser added a comment - It looks like this actually changed the per-write-thread caches to be global for the tserver? If that is the case, I think we need some updated documentation too given how much we called out tserver.mutation.queue.max in the release notes and elsewhere.
          Hide
          ecn Eric Newton added a comment -

          Yes. But not really. It's just global tracking of the per-connection cache sizes. And tserver.mutation.queue.max was deprecated for tserver.total.mutation.queue.max, which is a new properly. Bonus: you won't blow out the JVM memory by setting this value too high.

          Show
          ecn Eric Newton added a comment - Yes. But not really. It's just global tracking of the per-connection cache sizes. And tserver.mutation.queue.max was deprecated for tserver.total.mutation.queue.max, which is a new properly. Bonus: you won't blow out the JVM memory by setting this value too high.
          Hide
          elserj Josh Elser added a comment -

          And tserver.mutation.queue.max was deprecated for tserver.total.mutation.queue.max, which is a new properly

          Yeah, this is what I meant. It wasn't entirely obvious to me how the new property differed from the old (w/o reading code), most notably the interactions of the old property with the new. Given your comment about not blowing out the JVM, is this not as "critical" for users to set correctly for performance reasons as the old property was?

          Show
          elserj Josh Elser added a comment - And tserver.mutation.queue.max was deprecated for tserver.total.mutation.queue.max, which is a new properly Yeah, this is what I meant. It wasn't entirely obvious to me how the new property differed from the old (w/o reading code), most notably the interactions of the old property with the new. Given your comment about not blowing out the JVM, is this not as "critical" for users to set correctly for performance reasons as the old property was?

            People

            • Assignee:
              ecn Eric Newton
              Reporter:
              kturner Keith Turner
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m

                  Development