HBase
  1. HBase
  2. HBASE-10205

ConcurrentModificationException in BucketAllocator

    Details

    • Hadoop Flags:
      Reviewed

      Description

      The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the RAM queue containing entries to be cached. freeSpace() in turn calls BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), which iterates over 'bucketList'. At the same time another WriterThread might call BucketAllocator.allocateBlock(), which may call BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently cause a ConcurrentModificationException. Calls to BucketAllocator.allocateBlock() are synchronized, but calls to BucketAllocator.getIndexStatistics() are not, which allows this race to occur.

        Issue Links

          Activity

          Hide
          chunhui shen added a comment -

          Make a patch for trunk.

          Thanks Arjen's found

          Show
          chunhui shen added a comment - Make a patch for trunk. Thanks Arjen's found
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12619494/hbase-10205-trunk.patch
          against trunk revision .
          ATTACHMENT ID: 12619494

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619494/hbase-10205-trunk.patch against trunk revision . ATTACHMENT ID: 12619494 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Patch looks good.

          Show
          Ted Yu added a comment - Patch looks good.
          Hide
          Andrew Purtell added a comment -

          +1, please consider commit to other affected branches also

          Show
          Andrew Purtell added a comment - +1, please consider commit to other affected branches also
          Hide
          Arjen Roodselaar added a comment -

          I took the Chunhui's patch, but made some additional changes to allow parallelism in the BucketAllocator on the BucketSizeInfo level. I think synchronization can be removed from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), but need to verify this under some stress. I will be running a load test over the next 24 hours and report back.

          Show
          Arjen Roodselaar added a comment - I took the Chunhui's patch, but made some additional changes to allow parallelism in the BucketAllocator on the BucketSizeInfo level. I think synchronization can be removed from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), but need to verify this under some stress. I will be running a load test over the next 24 hours and report back.
          Hide
          chunhui shen added a comment -

          Arjen Roodselaar
          As my understand to remove the synchronization from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), we should add synchronization for BucketSizeInfo#allocateBlock and BucketSizeInfo#findAndRemoveCompletelyFreeBucket, it may cause dead lock.

          Another solution is catch the ConcurrentModificationException and take a retry since it happens occasionally

          Show
          chunhui shen added a comment - Arjen Roodselaar As my understand to remove the synchronization from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), we should add synchronization for BucketSizeInfo#allocateBlock and BucketSizeInfo#findAndRemoveCompletelyFreeBucket, it may cause dead lock. Another solution is catch the ConcurrentModificationException and take a retry since it happens occasionally
          Hide
          Arjen Roodselaar added a comment -

          Fair point, I should have thought about this a bit harder. Since we want multiple writer threads to deal with the possible fsync'ing to a device but the off heap implementation will probably suffer from contention on the BucketAllocator, does it make sense to just assume the BucketAllocator to not be thread safe, and instead have an AllocatorThread pulling entries from a LinkedTransferQueue, assign the bucket and offset (and perform freeSpace if necessary) and hand them over to one of the writer threads?

          Show
          Arjen Roodselaar added a comment - Fair point, I should have thought about this a bit harder. Since we want multiple writer threads to deal with the possible fsync'ing to a device but the off heap implementation will probably suffer from contention on the BucketAllocator, does it make sense to just assume the BucketAllocator to not be thread safe, and instead have an AllocatorThread pulling entries from a LinkedTransferQueue, assign the bucket and offset (and perform freeSpace if necessary) and hand them over to one of the writer threads?
          Hide
          chunhui shen added a comment -

          I think the contention on the BucketAllocator won't be the bottleneck of writer threads.

          AllocateBlock() are only triggerd after we miss the block from cache and get block from HDFS,
          In that case, IOPS is the bottleneck of system.

          Show
          chunhui shen added a comment - I think the contention on the BucketAllocator won't be the bottleneck of writer threads. AllocateBlock() are only triggerd after we miss the block from cache and get block from HDFS, In that case, IOPS is the bottleneck of system.
          Hide
          stack added a comment -

          Applied to master, branch-1 and 0.98. Ran it for a while on cluster and perf seemed same as w/o (and it worked). Thanks Arjen Roodselaar and chunhui shen

          Show
          stack added a comment - Applied to master, branch-1 and 0.98. Ran it for a while on cluster and perf seemed same as w/o (and it worked). Thanks Arjen Roodselaar and chunhui shen
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98 #440 (See https://builds.apache.org/job/HBase-0.98/440/)
          HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #440 (See https://builds.apache.org/job/HBase-0.98/440/ ) HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #414 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/414/)
          HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #414 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/414/ ) HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5377 (See https://builds.apache.org/job/HBase-TRUNK/5377/)
          HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev e17a3ca0913893d59562d8bfb40da4d70b3e39c7)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5377 (See https://builds.apache.org/job/HBase-TRUNK/5377/ ) HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev e17a3ca0913893d59562d8bfb40da4d70b3e39c7) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Hide
          deepankar added a comment -

          stack and Enis Soztutar Looks like this patch has not been committed on branch-1, is there something I am missing, or it is missed by mistake ? I checked by using the command git log --pretty=format:"%ad %h %s %an" --date=short | grep "HBASE-10205".

          Thanks

          Show
          deepankar added a comment - stack and Enis Soztutar Looks like this patch has not been committed on branch-1, is there something I am missing, or it is missed by mistake ? I checked by using the command git log --pretty=format:"%ad %h %s %an" --date=short | grep " HBASE-10205 " . Thanks
          Hide
          Andrew Purtell added a comment -

          That's a correct observation, this was only committed to the 'master' branch and 0.98. Let me look at putting it into the 1.x branches now.

          Show
          Andrew Purtell added a comment - That's a correct observation, this was only committed to the 'master' branch and 0.98. Let me look at putting it into the 1.x branches now.
          Hide
          Andrew Purtell added a comment -

          I will commit to branch-1 only for now after running the unit test suite, since it is already in master and 0.98, although it should go into branch-1.1 and branch-1.2 assuming the RMs for those branches are ok with the changes. The patch changes the type of the bucket lists and adds synchronization to some methods in BucketAllocator. Unfortunately it also has this code smell in BucketCache:

          +    } catch (Throwable t) {
          +      LOG.warn("Failed freeing space", t);
          

          /cc Nick Dimiduk Sean Busbey

          Show
          Andrew Purtell added a comment - I will commit to branch-1 only for now after running the unit test suite, since it is already in master and 0.98, although it should go into branch-1.1 and branch-1.2 assuming the RMs for those branches are ok with the changes. The patch changes the type of the bucket lists and adds synchronization to some methods in BucketAllocator. Unfortunately it also has this code smell in BucketCache: + } catch (Throwable t) { + LOG.warn( "Failed freeing space" , t); /cc Nick Dimiduk Sean Busbey
          Hide
          Andrew Purtell added a comment -

          Actually I am concerned enough about this change that I will open a new JIRA. This one is closed.

          Show
          Andrew Purtell added a comment - Actually I am concerned enough about this change that I will open a new JIRA. This one is closed.
          Hide
          stack added a comment -

          deepankar See HBASE-15691 where Andy opened a follow on...

          Show
          stack added a comment - deepankar See HBASE-15691 where Andy opened a follow on...
          Hide
          Sean Busbey added a comment -

          should 0.99.0 be removed from the fix list?

          Show
          Sean Busbey added a comment - should 0.99.0 be removed from the fix list?

            People

            • Assignee:
              Arjen Roodselaar
              Reporter:
              Arjen Roodselaar
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development