Accumulo
  1. Accumulo
  2. ACCUMULO-3079

improve system iterator performance by collapsing call stack

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      System iterators are at the core of the tightest loops in Accumulo, handling every key/value pair that traverses through a scan or a compaction. In many cases, iterators are the current performance bottleneck for Accumulo. Every bit that we can improve performance in the iterators translates into better performance for Accumulo.

      There are several strategies that can be applied to the current code base to improve performance, including:

      1. Inlining calls that are hard for the JVM to inline at runtime
      2. Moving checks for null outside of tight loops when they are invariants within the loop
      3. Eliminating "no-op" iterators at iterator tree construction time
      4. Making frequently used and assigned-once objects final (like iterator sources)

        Issue Links

          Activity

          Hide
          Adam Fuchs added a comment -

          Attached is a preliminary patch that results in about 50% performance improvements for long scans of small entries (~10 bytes). It still needs a bit of cleanup, but it is representative of the desired improvements.

          Show
          Adam Fuchs added a comment - Attached is a preliminary patch that results in about 50% performance improvements for long scans of small entries (~10 bytes). It still needs a bit of cleanup, but it is representative of the desired improvements.
          Hide
          Josh Elser added a comment -

          Nifty, did you use the code from ACCUMULO-3067 to see the performance improvements? Anything else that can be replicated?

          Show
          Josh Elser added a comment - Nifty, did you use the code from ACCUMULO-3067 to see the performance improvements? Anything else that can be replicated?
          Hide
          Adam Fuchs added a comment -

          The test code from ACCUMULO-3067 is what lead me to the improvements. I also have a very simple test harness that isolates and measures performance of the system iterators that I will attach here. Let me know if you think there's a better place for this performance test code to go.

          Show
          Adam Fuchs added a comment - The test code from ACCUMULO-3067 is what lead me to the improvements. I also have a very simple test harness that isolates and measures performance of the system iterators that I will attach here. Let me know if you think there's a better place for this performance test code to go.
          Hide
          Josh Elser added a comment -

          The test code from ACCUMULO-3067 is what lead me to the improvements.

          Ok, cool.

          I also have a very simple test harness that isolates and measures performance of the system iterators that I will attach here.

          Awesome, thank you.

          Let me know if you think there's a better place for this performance test code to go.

          It's hard to write code that automatically catches regressions, but perhaps, with adequate documentation of releases, we could track expected performance numbers and at least improve the odds of not shipping big performance decreases. Anything that is (relatively) easily runnable with documentation how to do so would be great.

          Show
          Josh Elser added a comment - The test code from ACCUMULO-3067 is what lead me to the improvements. Ok, cool. I also have a very simple test harness that isolates and measures performance of the system iterators that I will attach here. Awesome, thank you. Let me know if you think there's a better place for this performance test code to go. It's hard to write code that automatically catches regressions, but perhaps, with adequate documentation of releases, we could track expected performance numbers and at least improve the odds of not shipping big performance decreases. Anything that is (relatively) easily runnable with documentation how to do so would be great.
          Hide
          Keith Turner added a comment -

          Adam Fuchs are you still working on this patch? Was thinking of looking at the code.

          Show
          Keith Turner added a comment - Adam Fuchs are you still working on this patch? Was thinking of looking at the code.
          Hide
          Adam Fuchs added a comment -

          Keith Turner I'm only working on cleaning up things like commented code. I think the optimizations are stable now. There is the possibility of refactoring more of the client code as well to reduce code duplication that this patch introduces, but I would say you should take a look now rather than waiting.

          Show
          Adam Fuchs added a comment - Keith Turner I'm only working on cleaning up things like commented code. I think the optimizations are stable now. There is the possibility of refactoring more of the client code as well to reduce code duplication that this patch introduces, but I would say you should take a look now rather than waiting.
          Hide
          Josh Elser added a comment -

          I just stumbled on trying to figure out exactly what the SynchronizedIterator does (to document it) and noticed that you also made changes here to try to alleviate the extra iterator in the stack. I'm wondering if it makes sense to keep that synchronized filter in the stack during normal cases anyways? Most times, I don't think Iterators really lend themselves well to doing multi-threaded operations anyways. I see in ACCUMULO-533 when you added the SynchronizedIterator, you didn't see any performance impact. I'm wondering if it would make sense to add an option to the client API that allows configuration of the SynchronizedIterator (or the filter as your patch changes it to be). Thoughts?

          Show
          Josh Elser added a comment - I just stumbled on trying to figure out exactly what the SynchronizedIterator does (to document it) and noticed that you also made changes here to try to alleviate the extra iterator in the stack. I'm wondering if it makes sense to keep that synchronized filter in the stack during normal cases anyways? Most times, I don't think Iterators really lend themselves well to doing multi-threaded operations anyways. I see in ACCUMULO-533 when you added the SynchronizedIterator, you didn't see any performance impact. I'm wondering if it would make sense to add an option to the client API that allows configuration of the SynchronizedIterator (or the filter as your patch changes it to be). Thoughts?
          Hide
          Adam Fuchs added a comment -

          Josh Elser In this patch I moved the synchronization to the filter, so the functionality is still there. Along the way, I did play around with removing the SyncronizedIterator altogether in the case where no client iterators are present. I didn't take specific performance measurements isolating that change, so I'm not sure if it's more than the order of 1% referenced in ACCUMULO-533. However, it does shorten the stack, which seems to be related to the funkiness around ACCUMULO-3067. I could subjectively say that the performance difference between not having the Synchronized iterator at all and moving its functionality into the VisibilityFilter was small enough to be hidden in the testing noise, but I don't have evidence to objectively state that.

          Show
          Adam Fuchs added a comment - Josh Elser In this patch I moved the synchronization to the filter, so the functionality is still there. Along the way, I did play around with removing the SyncronizedIterator altogether in the case where no client iterators are present. I didn't take specific performance measurements isolating that change, so I'm not sure if it's more than the order of 1% referenced in ACCUMULO-533 . However, it does shorten the stack, which seems to be related to the funkiness around ACCUMULO-3067 . I could subjectively say that the performance difference between not having the Synchronized iterator at all and moving its functionality into the VisibilityFilter was small enough to be hidden in the testing noise, but I don't have evidence to objectively state that.
          Hide
          Josh Elser added a comment -

          I moved the synchronization to the filter, so the functionality is still there

          Right, I was just noting that you came to the same conclusion about being able to move it.

          I did play around with removing the SyncronizedIterator altogether in the case where no client iterators are present

          Hrmm. I wonder how that works for compaction time iterators? Mostly a rhetorical question since you didn't include that change here.

          I could subjectively say that the performance difference between not having the Synchronized iterator at all and moving its functionality into the VisibilityFilter was small enough to be hidden in the testing noise, but I don't have evidence to objectively state that.

          Ok, I didn't necessarily expect you to have specifics for that either. I brought it up mostly because I was curious. While having the synchronized "barrier" in the iterator stack does make isolate user iterators from mucking up the system iterators, I think I was really wondering if it even makes sense to "encourage" users to write multi-threaded iterators. I know I personally haven't come across any use cases where multiple threads are used effectively by an iterator. Would it make sense to remove the synchronized iterator(filter) from the default "pipeline" and wire up something through the client API and/or table configuration that can let users enable it.

          Not implying that this should be done for this ticket, it's just a parallel thought that has some relevancy here.

          Show
          Josh Elser added a comment - I moved the synchronization to the filter, so the functionality is still there Right, I was just noting that you came to the same conclusion about being able to move it. I did play around with removing the SyncronizedIterator altogether in the case where no client iterators are present Hrmm. I wonder how that works for compaction time iterators? Mostly a rhetorical question since you didn't include that change here. I could subjectively say that the performance difference between not having the Synchronized iterator at all and moving its functionality into the VisibilityFilter was small enough to be hidden in the testing noise, but I don't have evidence to objectively state that. Ok, I didn't necessarily expect you to have specifics for that either. I brought it up mostly because I was curious. While having the synchronized "barrier" in the iterator stack does make isolate user iterators from mucking up the system iterators, I think I was really wondering if it even makes sense to "encourage" users to write multi-threaded iterators. I know I personally haven't come across any use cases where multiple threads are used effectively by an iterator. Would it make sense to remove the synchronized iterator(filter) from the default "pipeline" and wire up something through the client API and/or table configuration that can let users enable it. Not implying that this should be done for this ticket, it's just a parallel thought that has some relevancy here.
          Hide
          Adam Fuchs added a comment -

          Good question on the compaction scope changes. Since IteratorUtil is used in all scopes, the attached patch does remove the synchronization from the compaction scope. We don't generally do visibility filtering in the compaction scope, so I don't suppose the security vulnerability applies. Do you see any reason why we would need synchronization on the compaction scope?

          I was really wondering if it even makes sense to "encourage" users to write multi-threaded iterators.

          I don't believe I have ever come across a use case in which multiple threads are needed (or useful) within an iterator either. From a security perspective, another approach to solving this problem would be to use the java SecurityManager to prevent iterators from using multiple threads.

          Show
          Adam Fuchs added a comment - Good question on the compaction scope changes. Since IteratorUtil is used in all scopes, the attached patch does remove the synchronization from the compaction scope. We don't generally do visibility filtering in the compaction scope, so I don't suppose the security vulnerability applies. Do you see any reason why we would need synchronization on the compaction scope? I was really wondering if it even makes sense to "encourage" users to write multi-threaded iterators. I don't believe I have ever come across a use case in which multiple threads are needed (or useful) within an iterator either. From a security perspective, another approach to solving this problem would be to use the java SecurityManager to prevent iterators from using multiple threads.
          Hide
          Josh Elser added a comment -

          Do you see any reason why we would need synchronization on the compaction scope?

          It's possible that a user configured their own crazy com.multi.threaded.MyIterator at compaction time which could need synchronization, but that's also a reach given that the use-case for multi-threading is a stretch anyways.

          another approach to solving this problem would be to use the java SecurityManager to prevent iterators from using multiple threads.

          That's certainly stronger than my first thought of documenting somewhere "don't start threads in an iterator" . Hopefully if a user tried to do this on their own, they'd realize that they might be doing something weird when they don't have the hooks to ensure 100% that any threads started also terminate.

          Show
          Josh Elser added a comment - Do you see any reason why we would need synchronization on the compaction scope? It's possible that a user configured their own crazy com.multi.threaded.MyIterator at compaction time which could need synchronization, but that's also a reach given that the use-case for multi-threading is a stretch anyways. another approach to solving this problem would be to use the java SecurityManager to prevent iterators from using multiple threads. That's certainly stronger than my first thought of documenting somewhere "don't start threads in an iterator" . Hopefully if a user tried to do this on their own, they'd realize that they might be doing something weird when they don't have the hooks to ensure 100% that any threads started also terminate.
          Hide
          Josh Elser added a comment -

          Upped this to critical as the improvements are significant. It would be great to get this committed.

          Show
          Josh Elser added a comment - Upped this to critical as the improvements are significant. It would be great to get this committed.

            People

            • Assignee:
              Adam Fuchs
              Reporter:
              Adam Fuchs
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development