Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4066

Conditional mutation processing performance could be improved.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.4, 1.7.0
    • Fix Version/s: 1.6.5, 1.7.1, 1.8.0
    • Component/s: tserver
    • Labels:
      None

      Description

      When processing conditional mutations tablets reads are done. The way the current implementation does tablet reads has a lot of overhead. For each condition the following is done :

      • Opens and reserves iterators files.
      • Parse table iterators from table config (involves scanning and filtering entire table config)
      • Merges condition iterators and table iterators
      • Constructs iterator stack.

      I created a branch where these operations (except for constructing iterator stack) are done per tablet and/or per batch of conditional mutations. Doing this I am seeing a 3x speed up in conditional mutation processing rates when data is cached.

        Issue Links

          Activity

          Hide
          elserj Josh Elser added a comment -

          This one slipped in under my radar. Thought I'd give your changes a glance. 3x speed up is awesome!

          -    for (Condition cond : cm.getConditions()) {
          +    // sort conditions inorder to get better lookup performance. Sort on client side so tserver does not have to do it.
          +    Condition[] ca = cm.getConditions().toArray(new Condition[cm.getConditions().size()]);
          +    Arrays.sort(ca, CONDITION_COMPARATOR);
          

          To confirm, the server doesnt' rely on the sorted order, just hopes for it for performance reasons?

          I see a lot of changes in IteratorUtil (I assume to your point about loading iterators from the table config). How did this used to work? You had lots of new tests added for the other cases – do we have good coverage for IteratorUtil already?

          Show
          elserj Josh Elser added a comment - This one slipped in under my radar. Thought I'd give your changes a glance. 3x speed up is awesome! - for (Condition cond : cm.getConditions()) { + // sort conditions inorder to get better lookup performance. Sort on client side so tserver does not have to do it. + Condition[] ca = cm.getConditions().toArray( new Condition[cm.getConditions().size()]); + Arrays.sort(ca, CONDITION_COMPARATOR); To confirm, the server doesnt' rely on the sorted order, just hopes for it for performance reasons? I see a lot of changes in IteratorUtil (I assume to your point about loading iterators from the table config). How did this used to work? You had lots of new tests added for the other cases – do we have good coverage for IteratorUtil already?
          Hide
          kturner Keith Turner added a comment -

          To confirm, the server doesnt' rely on the sorted order, just hopes for it for performance reasons?

          correct.

          I see a lot of changes in IteratorUtil (I assume to your point about loading iterators from the table config). How did this used to work

          I needed parse table iter config once, cache that and then later merge condition iterators. I changed IteratorUtil to support this use case. Used to it would parse the table iterator config for each condition. I also modified the code to support caching class name to classes (Instead of going to the VFS classloader for each condition to load a class).

          Adding some test for IteratorUtil is a good idea.

          Show
          kturner Keith Turner added a comment - To confirm, the server doesnt' rely on the sorted order, just hopes for it for performance reasons? correct. I see a lot of changes in IteratorUtil (I assume to your point about loading iterators from the table config). How did this used to work I needed parse table iter config once, cache that and then later merge condition iterators. I changed IteratorUtil to support this use case. Used to it would parse the table iterator config for each condition. I also modified the code to support caching class name to classes (Instead of going to the VFS classloader for each condition to load a class). Adding some test for IteratorUtil is a good idea.
          Hide
          kturner Keith Turner added a comment -

          Reopened to add some test for IteratorUtil

          Show
          kturner Keith Turner added a comment - Reopened to add some test for IteratorUtil
          Hide
          kturner Keith Turner added a comment -

          Some of the test added to ConditionalWriterIT ensure that table and condition iterators are merged properly. This is an indirect test of IteratorUtil

          Show
          kturner Keith Turner added a comment - Some of the test added to ConditionalWriterIT ensure that table and condition iterators are merged properly. This is an indirect test of IteratorUtil
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Marking this as fixed for now, in prep for 1.6.5 release. New tests for IteratorUtil can be done in a separate ticket, if necessary.

          Show
          ctubbsii Christopher Tubbs added a comment - Marking this as fixed for now, in prep for 1.6.5 release. New tests for IteratorUtil can be done in a separate ticket, if necessary.

            People

            • Assignee:
              kturner Keith Turner
              Reporter:
              kturner Keith Turner
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                  Development