HBase
  1. HBase
  2. HBASE-10205

ConcurrentModificationException in BucketAllocator

    Details

    • Hadoop Flags:
      Reviewed

      Description

      The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the RAM queue containing entries to be cached. freeSpace() in turn calls BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), which iterates over 'bucketList'. At the same time another WriterThread might call BucketAllocator.allocateBlock(), which may call BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently cause a ConcurrentModificationException. Calls to BucketAllocator.allocateBlock() are synchronized, but calls to BucketAllocator.getIndexStatistics() are not, which allows this race to occur.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        3h 15m 1 chunhui shen 19/Dec/13 06:11
        Patch Available Patch Available Resolved Resolved
        230d 16h 47m 1 stack 06/Aug/14 23:58
        Resolved Resolved Closed Closed
        199d 32m 1 Enis Soztutar 21/Feb/15 23:31
        Enis Soztutar made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Enis Soztutar added a comment -

        Closing this issue after 0.99.0 release.

        Show
        Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5377 (See https://builds.apache.org/job/HBase-TRUNK/5377/)
        HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev e17a3ca0913893d59562d8bfb40da4d70b3e39c7)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
        • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5377 (See https://builds.apache.org/job/HBase-TRUNK/5377/ ) HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev e17a3ca0913893d59562d8bfb40da4d70b3e39c7) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #414 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/414/)
        HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
        • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #414 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/414/ ) HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98 #440 (See https://builds.apache.org/job/HBase-0.98/440/)
        HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
        • hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #440 (See https://builds.apache.org/job/HBase-0.98/440/ ) HBASE-10205 ConcurrentModificationException in BucketAllocator (Arjen Roodselaar and Chunhui Shen) (stack: rev 2c55970ed3de5dac551d8eaa5e2914c18d814962) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
        stack made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 2.0.0 [ 12327188 ]
        Fix Version/s 0.98.6 [ 12327376 ]
        Resolution Fixed [ 1 ]
        Hide
        stack added a comment -

        Applied to master, branch-1 and 0.98. Ran it for a while on cluster and perf seemed same as w/o (and it worked). Thanks Arjen Roodselaar and chunhui shen

        Show
        stack added a comment - Applied to master, branch-1 and 0.98. Ran it for a while on cluster and perf seemed same as w/o (and it worked). Thanks Arjen Roodselaar and chunhui shen
        Hide
        chunhui shen added a comment -

        I think the contention on the BucketAllocator won't be the bottleneck of writer threads.

        AllocateBlock() are only triggerd after we miss the block from cache and get block from HDFS,
        In that case, IOPS is the bottleneck of system.

        Show
        chunhui shen added a comment - I think the contention on the BucketAllocator won't be the bottleneck of writer threads. AllocateBlock() are only triggerd after we miss the block from cache and get block from HDFS, In that case, IOPS is the bottleneck of system.
        Hide
        Arjen Roodselaar added a comment -

        Fair point, I should have thought about this a bit harder. Since we want multiple writer threads to deal with the possible fsync'ing to a device but the off heap implementation will probably suffer from contention on the BucketAllocator, does it make sense to just assume the BucketAllocator to not be thread safe, and instead have an AllocatorThread pulling entries from a LinkedTransferQueue, assign the bucket and offset (and perform freeSpace if necessary) and hand them over to one of the writer threads?

        Show
        Arjen Roodselaar added a comment - Fair point, I should have thought about this a bit harder. Since we want multiple writer threads to deal with the possible fsync'ing to a device but the off heap implementation will probably suffer from contention on the BucketAllocator, does it make sense to just assume the BucketAllocator to not be thread safe, and instead have an AllocatorThread pulling entries from a LinkedTransferQueue, assign the bucket and offset (and perform freeSpace if necessary) and hand them over to one of the writer threads?
        Hide
        chunhui shen added a comment -

        Arjen Roodselaar
        As my understand to remove the synchronization from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), we should add synchronization for BucketSizeInfo#allocateBlock and BucketSizeInfo#findAndRemoveCompletelyFreeBucket, it may cause dead lock.

        Another solution is catch the ConcurrentModificationException and take a retry since it happens occasionally

        Show
        chunhui shen added a comment - Arjen Roodselaar As my understand to remove the synchronization from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), we should add synchronization for BucketSizeInfo#allocateBlock and BucketSizeInfo#findAndRemoveCompletelyFreeBucket, it may cause dead lock. Another solution is catch the ConcurrentModificationException and take a retry since it happens occasionally
        Hide
        Arjen Roodselaar added a comment -

        I took the Chunhui's patch, but made some additional changes to allow parallelism in the BucketAllocator on the BucketSizeInfo level. I think synchronization can be removed from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), but need to verify this under some stress. I will be running a load test over the next 24 hours and report back.

        Show
        Arjen Roodselaar added a comment - I took the Chunhui's patch, but made some additional changes to allow parallelism in the BucketAllocator on the BucketSizeInfo level. I think synchronization can be removed from BucketAllocator.allocateBlock() and BucketAllocator.freeBlock(), but need to verify this under some stress. I will be running a load test over the next 24 hours and report back.
        Hide
        Andrew Purtell added a comment -

        +1, please consider commit to other affected branches also

        Show
        Andrew Purtell added a comment - +1, please consider commit to other affected branches also
        Hide
        Ted Yu added a comment -

        Patch looks good.

        Show
        Ted Yu added a comment - Patch looks good.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12619494/hbase-10205-trunk.patch
        against trunk revision .
        ATTACHMENT ID: 12619494

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619494/hbase-10205-trunk.patch against trunk revision . ATTACHMENT ID: 12619494 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8227//console This message is automatically generated.
        chunhui shen made changes -
        Fix Version/s 0.99.0 [ 12325675 ]
        chunhui shen made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        chunhui shen made changes -
        Attachment hbase-10205-trunk.patch [ 12619494 ]
        Hide
        chunhui shen added a comment -

        Make a patch for trunk.

        Thanks Arjen's found

        Show
        chunhui shen added a comment - Make a patch for trunk. Thanks Arjen's found
        Arjen Roodselaar made changes -
        Component/s regionserver [ 12312139 ]
        Arjen Roodselaar made changes -
        Field Original Value New Value
        Description The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the RAM queue containing entries to be cached. freeSpace() in turn calls BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), which iterates 'bucketList'. At the same time another WriterThread might call BucketAllocator.allocateBlock(), which may call BucketSizeInfo.allocate() and add a bucket to 'bucketList' and consequently cause a ConcurrentModificationException. Calls to BucketAllocator.allocateBlock() are synchronized, but calls to BucketAllocator.getIndexStatistics() are not, which allows this race to occur. The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the RAM queue containing entries to be cached. freeSpace() in turn calls BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), which iterates over 'bucketList'. At the same time another WriterThread might call BucketAllocator.allocateBlock(), which may call BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently cause a ConcurrentModificationException. Calls to BucketAllocator.allocateBlock() are synchronized, but calls to BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
        Arjen Roodselaar created issue -

          People

          • Assignee:
            Arjen Roodselaar
            Reporter:
            Arjen Roodselaar
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development