Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3835

TabletServerBatchReaderIterator concurrency contention on early close()

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.2
    • Fix Version/s: 1.8.0
    • Component/s: client
    • Labels:
      None

      Description

      When running many batch scanners in a JVM, we see a significant amount of cache invalidation within the JVM when we prematurely close the batch scanner. When we prematurely close the batch scanner ( and threads within it are running an interrupt is sent to those threads, causing them to add the extents to the failures map and invalidate the cache. This causes lock contention on the write lock for the TabletLocatorImpl.

      The lock contention hinders performance in a highly parallel client that does not need to invalid the cache as a result of being stopped.

      As a positive test to ensure this was my problem I added a conditional check to the IOException handler in the run method of TabletServerBatchReaderIterator. The conditional checked whether the query thread pool was shut down. If it was not, we would invalidate the cache as we can assume the reason likely wasn't interruption. If it was shut down, we would not invalide the cache. This reduced lock contention dramatically and reduced runtime. This should cause no harm as any other failure would cause cache invalidation through some other route.

        Activity

        Hide
        elserj Josh Elser added a comment -

        Seems reasonable to me. We should be able to discern the case when close() was called and fail out gracefully and quickly.

        Show
        elserj Josh Elser added a comment - Seems reasonable to me. We should be able to discern the case when close() was called and fail out gracefully and quickly.
        Hide
        ctubbsii Christopher Tubbs added a comment -

        Do we want to backport this bugfix to 1.6.5 or 1.7.1, since it was originally reported against 1.6.2?

        Show
        ctubbsii Christopher Tubbs added a comment - Do we want to backport this bugfix to 1.6.5 or 1.7.1, since it was originally reported against 1.6.2?
        Hide
        elserj Josh Elser added a comment -

        +1 on a backport at first glance.

        Show
        elserj Josh Elser added a comment - +1 on a backport at first glance.

          People

          • Assignee:
            ecn Eric Newton
            Reporter:
            phrocker marco polo
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 40m
              40m

                Development