HBase
  1. HBase
  2. HBASE-11368

Multi-column family BulkLoad fails if compactions go on too long

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails.

      Various recipes include:
      + Making smaller regions (lame)
      + Victor Xu suggests major compacting just before bulk loading over in HBASE-10882 as a work around.

      Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least?

      1. hbase-11368-0.98.5.patch
        4 kB
        Qiang Tian
      2. hbase11368-master.patch
        4 kB
        Qiang Tian
      3. key_stacktrace_hbase10882.TXT
        11 kB
        Qiang Tian
      4. performance_improvement_verification_98.5.patch
        7 kB
        Qiang Tian

        Activity

        stack created issue -
        Show
        stack added a comment - This is an old issue, http://search-hadoop.com/m/0AGoj1C1AXY/org.apache.hadoop.hbase.RegionTooBusyException%253A+failed+to+get+a+lock+in+60000ms&subj=Bulk+loading+HFiles+via+LoadIncrementalHFiles+fails+at+a+region+that+is+being+compacted+a+bug+
        Hide
        Qiang Tian added a comment -

        As stack mentioned in http://search-hadoop.com/m/DHED4NR0wT, the HRegion#lock is to protect region close. the comments in HRegion.java and the fact that only HRegion#doClose locks the writelock(if we do not consider HRegion#startBulkRegionOperation) also show that.

        so using HRegion#lock to protect multi-CF bulkload in HBASE-4552 looks too heavy-weight?
        from the stacktrace of HBASE-10882, all the read/scan are blocked since bulkload is waiting for lock.writelock, however compaction already acquired lock.readlock and is reading data, a time-consuming operation.

        and related topic is discussed again in http://search-hadoop.com/m/DHED4I11p31. perhaps we need another region level lock.

        Show
        Qiang Tian added a comment - As stack mentioned in http://search-hadoop.com/m/DHED4NR0wT , the HRegion#lock is to protect region close. the comments in HRegion.java and the fact that only HRegion#doClose locks the writelock(if we do not consider HRegion#startBulkRegionOperation) also show that. so using HRegion#lock to protect multi-CF bulkload in HBASE-4552 looks too heavy-weight? from the stacktrace of HBASE-10882 , all the read/scan are blocked since bulkload is waiting for lock.writelock, however compaction already acquired lock.readlock and is reading data, a time-consuming operation. and related topic is discussed again in http://search-hadoop.com/m/DHED4I11p31 . perhaps we need another region level lock.
        Hide
        Qiang Tian added a comment -

        ideas for lowering down the lock granularity(based on 0.98.5 code base)
        1)read/scan
        is it the primary goal for atomic multi-CF bulkload in HBASE-4552?

        After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, notifyChangedReadersObservers is called to reset the StoreScanner#heap, so checkReseek->resetScannerStack will be triggered in next scan/read to recreate store scanners based on new storefiles.

        so we could introduce a new region level rwlock multiCFLock, HRegion#bulkLoadHFiles acquires the writelock before multi-CF HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack acquires the readlock. this way the scanners are recreated after all CFs' store files are populated.

        2)split region.
        the region will be closed in SplitTransaction#stepsBeforePONR, which falls into the HRegion#lock protection area. bulk load still still need to acquire its readlock at start.

        3) memstore flush.
        we flush to a new file which is not related to the loaded files.

        4)compaction.
        the compaction is performed store by store. if bulkload inserts new files to storefiles during the selectCompaction process, the file list to be compacted might be impacted. e.g., the compaction for some CF do not include new loaded files, while others might include. but this does not impact the data integrity and read behavior?
        at the end of compaction, storefiles access is still protected by HStore#lock if there is bulk load change to the same CF.

        comments?
        thanks

        Show
        Qiang Tian added a comment - ideas for lowering down the lock granularity(based on 0.98.5 code base) 1)read/scan is it the primary goal for atomic multi-CF bulkload in HBASE-4552 ? After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, notifyChangedReadersObservers is called to reset the StoreScanner#heap, so checkReseek->resetScannerStack will be triggered in next scan/read to recreate store scanners based on new storefiles. so we could introduce a new region level rwlock multiCFLock, HRegion#bulkLoadHFiles acquires the writelock before multi-CF HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack acquires the readlock. this way the scanners are recreated after all CFs' store files are populated. 2)split region. the region will be closed in SplitTransaction#stepsBeforePONR, which falls into the HRegion#lock protection area. bulk load still still need to acquire its readlock at start. 3) memstore flush. we flush to a new file which is not related to the loaded files. 4)compaction. the compaction is performed store by store. if bulkload inserts new files to storefiles during the selectCompaction process, the file list to be compacted might be impacted. e.g., the compaction for some CF do not include new loaded files, while others might include. but this does not impact the data integrity and read behavior? at the end of compaction, storefiles access is still protected by HStore#lock if there is bulk load change to the same CF. comments? thanks
        Hide
        Jerry He added a comment -

        Hi, Qiang Tian

        The idea may be feasible:

        bulk load begins: acquire region read lock + new bulk load write lock.
        Scan//next begins: acquire region read lock + new bulk load read lock.
        Other region operations: only acquire region read lock

        This will save the compaction and bulk load from blocking each other.

        Do you mind drafting a patch and run thru the test suite?

        Show
        Jerry He added a comment - Hi, Qiang Tian The idea may be feasible: bulk load begins: acquire region read lock + new bulk load write lock. Scan//next begins: acquire region read lock + new bulk load read lock. Other region operations: only acquire region read lock This will save the compaction and bulk load from blocking each other. Do you mind drafting a patch and run thru the test suite?
        Hide
        Jerry He added a comment -

        Other region operations: only acquire region read lock

        --> Other region operations: only acquire region read or write lock, no change from existing behavior.

        Show
        Jerry He added a comment - Other region operations: only acquire region read lock --> Other region operations: only acquire region read or write lock, no change from existing behavior.
        Qiang Tian made changes -
        Field Original Value New Value
        Assignee Qiang Tian [ tianq ]
        Hide
        Qiang Tian added a comment -

        Thanks Jerry He,
        is it right way to run the bulkload test? mvn test -Dtest=TestHRegionServerBulkLoad
        the test is supposed to run for 5 minutes, but only after about 1 minutes then it exits. is it expected?

        Show
        Qiang Tian added a comment - Thanks Jerry He , is it right way to run the bulkload test? mvn test -Dtest=TestHRegionServerBulkLoad the test is supposed to run for 5 minutes, but only after about 1 minutes then it exits. is it expected?
        Hide
        Jerry He added a comment -
          /**
           * Atomic bulk load.
           */
          @Test
          public void testAtomicBulkLoad() throws Exception {
            String TABLE_NAME = "atomicBulkLoad";
        
            int millisToRun = 30000;
        

        This test case is 30 sec.

          /**
           * Run test on an HBase instance for 5 minutes. This assumes that the table
           * under test only has a single region.
           */
          public static void main(String args[]) throws Exception {
        

        main is not invoked during JUnit run.

        Show
        Jerry He added a comment - /** * Atomic bulk load. */ @Test public void testAtomicBulkLoad() throws Exception { String TABLE_NAME = "atomicBulkLoad" ; int millisToRun = 30000; This test case is 30 sec. /** * Run test on an HBase instance for 5 minutes. This assumes that the table * under test only has a single region. */ public static void main( String args[]) throws Exception { main is not invoked during JUnit run.
        Hide
        Qiang Tian added a comment -

        update:
        the idea will cause deadlock since bulkload and scanner follow different orders to acquire bulkload lock and StoreScanner.lock. will look at if we could lower the granularity of storescanner lock.

        Show
        Qiang Tian added a comment - update: the idea will cause deadlock since bulkload and scanner follow different orders to acquire bulkload lock and StoreScanner.lock. will look at if we could lower the granularity of storescanner lock.
        Hide
        Qiang Tian added a comment -

        I forgot StoreScanner is per CF..earlier analysis is wrong:

        After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, notifyChangedReadersObservers is called to reset the StoreScanner#heap, so checkReseek->resetScannerStack will be triggered in next scan/read to recreate store scanners based on new storefiles.

        so we could introduce a new region level rwlock multiCFLock, HRegion#bulkLoadHFiles acquires the writelock before multi-CF HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack acquires the readlock. this way the scanners are recreated after all CFs' store files are populated.

        instead, the new lock should put at regionScanner layer. see the patch attached.

        the "mvn test" and "TestHRegionServerBulkLoad"(large test for atomic bulkload test) passed, still need to run large tests and performance test(any suggestions for it? YCSB?).

        the lock can be further limited to a smaller scope by split HStore#bulkLoadHFile into 2 parts:1) rename the bulkload files and put new files into store files list 2) notifyChangedReadersObservers. only #2 needs the lock.
        if HDFS file rename is fast, the split may not be needed.

        Show
        Qiang Tian added a comment - I forgot StoreScanner is per CF..earlier analysis is wrong: After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, notifyChangedReadersObservers is called to reset the StoreScanner#heap, so checkReseek->resetScannerStack will be triggered in next scan/read to recreate store scanners based on new storefiles. so we could introduce a new region level rwlock multiCFLock, HRegion#bulkLoadHFiles acquires the writelock before multi-CF HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack acquires the readlock. this way the scanners are recreated after all CFs' store files are populated. instead, the new lock should put at regionScanner layer. see the patch attached. the "mvn test" and "TestHRegionServerBulkLoad"(large test for atomic bulkload test) passed, still need to run large tests and performance test(any suggestions for it? YCSB?). the lock can be further limited to a smaller scope by split HStore#bulkLoadHFile into 2 parts:1) rename the bulkload files and put new files into store files list 2) notifyChangedReadersObservers. only #2 needs the lock. if HDFS file rename is fast, the split may not be needed.
        Qiang Tian made changes -
        Attachment hbase-11368-0.98.5.patch [ 12674719 ]
        Hide
        Qiang Tian added a comment -

        it looks to me the patch could show the value only when there is long compaction + gets/scans, not sure if Victor Xu wants to try it in some test env?
        thanks.

        Show
        Qiang Tian added a comment - it looks to me the patch could show the value only when there is long compaction + gets/scans, not sure if Victor Xu wants to try it in some test env? thanks.
        Hide
        Qiang Tian added a comment -

        initial YCSB test:

        Env:

        hadoop 2.2.0
        YCSB 1.0.4(Andrew's branch)
        3 nodes, 1 master, 2 RS //ignore cluster details since just to evaluate the new lock

        Steps:

        Followed Andrew's steps(see http://search-hadoop.com/m/DHED4hl7pC/)
        the seed table has 3 CFs, pre-split to 20 regions
        load 1 million rows to CF 'f1', using workloada
        run 3 iterations for workloadc and workloada respectively. the parameter in each run:

        -p columnfamily=f1 -p operationcount=1000000 -s -threads 10

        Results:

        0.98.5:
        workload c:
        [READ], AverageLatency(us), 496.225811
        [READ], AverageLatency(us), 510.206831
        [READ], AverageLatency(us), 501.256123

        workload a:
        [READ], AverageLatency(us), 676.4527555821747
        [READ], AverageLatency(us), 622.5544771452717
        [READ], AverageLatency(us), 628.1365657163067

        0.98.5+patch:
        workload c:
        [READ], AverageLatency(us), 536.334437
        [READ], AverageLatency(us), 508.405555
        [READ], AverageLatency(us), 491.416182

        workload a:
        [READ], AverageLatency(us), 640.3625218319231
        [READ], AverageLatency(us), 642.9719823488798
        [READ], AverageLatency(us), 631.7491770928287

        looks little performance penalty.

        I also ran PE in the cluster, since the test table has only 1 CF, the new lock is actually not used. interestingly, with the patch the performance is even a bit better...

        Show
        Qiang Tian added a comment - initial YCSB test: Env: — hadoop 2.2.0 YCSB 1.0.4(Andrew's branch) 3 nodes, 1 master, 2 RS //ignore cluster details since just to evaluate the new lock Steps: — Followed Andrew's steps(see http://search-hadoop.com/m/DHED4hl7pC/ ) the seed table has 3 CFs, pre-split to 20 regions load 1 million rows to CF 'f1', using workloada run 3 iterations for workloadc and workloada respectively. the parameter in each run: -p columnfamily=f1 -p operationcount=1000000 -s -threads 10 Results: — 0.98.5: workload c: [READ] , AverageLatency(us), 496.225811 [READ] , AverageLatency(us), 510.206831 [READ] , AverageLatency(us), 501.256123 workload a: [READ] , AverageLatency(us), 676.4527555821747 [READ] , AverageLatency(us), 622.5544771452717 [READ] , AverageLatency(us), 628.1365657163067 0.98.5+patch: workload c: [READ] , AverageLatency(us), 536.334437 [READ] , AverageLatency(us), 508.405555 [READ] , AverageLatency(us), 491.416182 workload a: [READ] , AverageLatency(us), 640.3625218319231 [READ] , AverageLatency(us), 642.9719823488798 [READ] , AverageLatency(us), 631.7491770928287 looks little performance penalty. I also ran PE in the cluster, since the test table has only 1 CF, the new lock is actually not used. interestingly, with the patch the performance is even a bit better...
        Hide
        Qiang Tian added a comment -

        A simple comparison test using updated TestHRegionServerBulkLoad.java the number is for just for reference. the real perf improvement might depend on a combination of factors, such as campaction time, bulkload time, scan/read workload type, request currency etc)

        98.5:
        -----------
        2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(345): loaded 16
        2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(346): compations 16

        2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(348): Scanners:
        //average # with 50 scanners
        2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(350): scanned 73
        2014-10-24 02:30:03,400 INFO [main] regionserver.TestHRegionServerBulkLoad(351): verified 18000 rows

        98.5+patch
        --------
        //since bulkload has smaller conflict with compaction, we get more bulkload/compaction request in fixed test cycle(5 minutes)
        2014-10-24 02:41:19,071 INFO [main] regionserver.TestHRegionServerBulkLoad(344): Loaders:
        2014-10-24 02:41:19,072 INFO [main] regionserver.TestHRegionServerBulkLoad(345): loaded 43
        2014-10-24 02:41:19,072 INFO [main] regionserver.TestHRegionServerBulkLoad(346): compations 43

        2014-10-24 02:41:19,073 INFO [main] regionserver.TestHRegionServerBulkLoad(348): Scanners:
        //since bulkload has smaller conflict with scan, we get more scans in fixed test cycle(5 minutes)
        //average # for 50 scanners
        2014-10-24 02:41:19,073 INFO [main] regionserver.TestHRegionServerBulkLoad(350): scanned 92
        2014-10-24 02:41:19,073 INFO [main] regionserver.TestHRegionServerBulkLoad(351): verified 25000 rows

        Show
        Qiang Tian added a comment - A simple comparison test using updated TestHRegionServerBulkLoad.java the number is for just for reference. the real perf improvement might depend on a combination of factors, such as campaction time, bulkload time, scan/read workload type, request currency etc) 98.5: ----------- 2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(345): loaded 16 2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(346): compations 16 2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(348): Scanners: //average # with 50 scanners 2014-10-24 02:30:03,399 INFO [main] regionserver.TestHRegionServerBulkLoad(350): scanned 73 2014-10-24 02:30:03,400 INFO [main] regionserver.TestHRegionServerBulkLoad(351): verified 18000 rows 98.5+patch -------- //since bulkload has smaller conflict with compaction, we get more bulkload/compaction request in fixed test cycle(5 minutes) 2014-10-24 02:41:19,071 INFO [main] regionserver.TestHRegionServerBulkLoad(344): Loaders: 2014-10-24 02:41:19,072 INFO [main] regionserver.TestHRegionServerBulkLoad(345): loaded 43 2014-10-24 02:41:19,072 INFO [main] regionserver.TestHRegionServerBulkLoad(346): compations 43 2014-10-24 02:41:19,073 INFO [main] regionserver.TestHRegionServerBulkLoad(348): Scanners: //since bulkload has smaller conflict with scan, we get more scans in fixed test cycle(5 minutes) //average # for 50 scanners 2014-10-24 02:41:19,073 INFO [main] regionserver.TestHRegionServerBulkLoad(350): scanned 92 2014-10-24 02:41:19,073 INFO [main] regionserver.TestHRegionServerBulkLoad(351): verified 25000 rows
        Qiang Tian made changes -
        Hide
        Qiang Tian added a comment -

        Hi stack, Andrew Purtell,
        any comments?
        thanks!

        Show
        Qiang Tian added a comment - Hi stack , Andrew Purtell , any comments? thanks!
        Hide
        stack added a comment -

        Is this meant to be in the patch?

        1449 LOG.info("###compaction get the closelock, sleep 20s to simulate slow compaction");
        1450 try

        { 1451 Thread.sleep(20000); 1452 }

        catch (InterruptedException e)

        { 1453 LOG.info("###sleep interrupted"); 1454 }

        What change did you do Qiang Tian?

        Show
        stack added a comment - Is this meant to be in the patch? 1449 LOG.info("###compaction get the closelock, sleep 20s to simulate slow compaction"); 1450 try { 1451 Thread.sleep(20000); 1452 } catch (InterruptedException e) { 1453 LOG.info("###sleep interrupted"); 1454 } What change did you do Qiang Tian ?
        Hide
        Qiang Tian added a comment -

        Hi stack,
        Sorry for confusing. let me explain from scratch:
        1)the root cause of problem - HRegion#lock.
        From the stacktrace in HBASE-10882(also see key_stacktrace_hbase10882.TXT attached), the event sequence is:
        1.1)the compaction acquires the readlock of HRegion#lock,
        1.2)the bulkload try to acquire the writelock of HRegion#lock if there are multiple CFs. it has to wait for compaction to release the readlock.
        1.3)scanners try to acquire the readlock of HRegion#lock. they have to wait for the bulkload to release the writelock.
        so both bulkload and scanners are blocked on HRegion#lock by compaction.

        2)what is HRegion#lock used for?
        Investigation on the HRegion#lock shows, it is originally designed to protect region close ONLY. if someone, such as region split, wants to close the region, it needs to wait for others release the readlock.
        Then HBASE-4552 used the lock to solve the multi-CF bulkload consistency issue. now we see it is too heavy.

        3)can we not use HRegion#lock in bulkload?
        the answer is yes.
        Internally, HStore#DefaultStoreFileManager#storefiles keeps track of the on-disk HFiles for a CF. we have below steps for the bulkload:
        3.1)moves HFiles directly to region directory
        3.2)add them into the storefiles list
        3.3)notify StoreScanner that the HFile list is changed, which is done by resetting the StoreScanner#heap to null. this forces existing StoreScanner instances to reinitialize based on new the HFiles seen on disk in next scan/read request.
        the step 3.2 and 3.3 is synchronized by HStore#lock. so we have CF level scan-bulkload consistency.

        To achieve multi-CF scan-bulkload consistency, if we do not use HRegion#lock, we still need another region level lock — a RegionScanner is composed of multiple StoreScanner, a StoreScanner(a CF scanner) is composed of a MemStoreScanner and multiple StoreFileScanner.

        the RegionScannerImpl#sortheap(and joinedHeap) is just the entry point of multiple StoreScanners. to have multi-CF consistency, we need synchronization here - a lock is needed, but it is used only between scan and bulkload.

        Regarding the code change you referenced, performance_improvement_verification_98.5.patch is to simulate the event sequence described in #1, for testing purpose only.

        currently I use 98.5 for test since it is stable and easy to evaluate the effect of the change.
        thanks.

        Show
        Qiang Tian added a comment - Hi stack , Sorry for confusing. let me explain from scratch: 1)the root cause of problem - HRegion#lock. From the stacktrace in HBASE-10882 (also see key_stacktrace_hbase10882.TXT attached), the event sequence is: 1.1)the compaction acquires the readlock of HRegion#lock, 1.2)the bulkload try to acquire the writelock of HRegion#lock if there are multiple CFs. it has to wait for compaction to release the readlock. 1.3)scanners try to acquire the readlock of HRegion#lock. they have to wait for the bulkload to release the writelock. so both bulkload and scanners are blocked on HRegion#lock by compaction. 2)what is HRegion#lock used for? Investigation on the HRegion#lock shows, it is originally designed to protect region close ONLY. if someone, such as region split, wants to close the region, it needs to wait for others release the readlock. Then HBASE-4552 used the lock to solve the multi-CF bulkload consistency issue. now we see it is too heavy. 3)can we not use HRegion#lock in bulkload? the answer is yes. Internally, HStore#DefaultStoreFileManager#storefiles keeps track of the on-disk HFiles for a CF. we have below steps for the bulkload: 3.1)moves HFiles directly to region directory 3.2)add them into the storefiles list 3.3)notify StoreScanner that the HFile list is changed, which is done by resetting the StoreScanner#heap to null. this forces existing StoreScanner instances to reinitialize based on new the HFiles seen on disk in next scan/read request. the step 3.2 and 3.3 is synchronized by HStore#lock. so we have CF level scan-bulkload consistency. To achieve multi-CF scan-bulkload consistency, if we do not use HRegion#lock, we still need another region level lock — a RegionScanner is composed of multiple StoreScanner, a StoreScanner(a CF scanner) is composed of a MemStoreScanner and multiple StoreFileScanner. the RegionScannerImpl#sortheap(and joinedHeap) is just the entry point of multiple StoreScanners. to have multi-CF consistency, we need synchronization here - a lock is needed, but it is used only between scan and bulkload. Regarding the code change you referenced, performance_improvement_verification_98.5.patch is to simulate the event sequence described in #1, for testing purpose only. currently I use 98.5 for test since it is stable and easy to evaluate the effect of the change. thanks.
        Qiang Tian made changes -
        Attachment key_stacktrace_hbase10882.TXT [ 12677097 ]
        Hide
        Qiang Tian added a comment -

        the attachments:
        key_stacktrace_hbase10882.TXT : the problem stacktrace
        hbase-11368-0.98.5.patch : the fix
        performance_improvement_verification_98.5.patch: the testcase to verify performance improvement

        Show
        Qiang Tian added a comment - the attachments: key_stacktrace_hbase10882.TXT : the problem stacktrace hbase-11368-0.98.5.patch : the fix performance_improvement_verification_98.5.patch : the testcase to verify performance improvement
        Hide
        Qiang Tian added a comment -

        patch for master branch

        Show
        Qiang Tian added a comment - patch for master branch
        Qiang Tian made changes -
        Attachment hbase11368-master.patch [ 12677543 ]
        Qiang Tian made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12677543/hbase11368-master.patch
        against trunk revision .
        ATTACHMENT ID: 12677543

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 checkstyle. The applied patch generated 3781 checkstyle errors (more than the trunk's current 3780 errors).

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestHCM
        org.apache.hadoop.hbase.master.TestSplitLogManager

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
        Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/checkstyle-aggregate.html

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677543/hbase11368-master.patch against trunk revision . ATTACHMENT ID: 12677543 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. -1 checkstyle . The applied patch generated 3781 checkstyle errors (more than the trunk's current 3780 errors). +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//console This message is automatically generated.
        Hide
        Stephen Yuan Jiang added a comment -

        Qiang Tian and stack, any update or concern on this patch? We have a customer seeing this issue recently.

        Show
        Stephen Yuan Jiang added a comment - Qiang Tian and stack , any update or concern on this patch? We have a customer seeing this issue recently.
        Hide
        Enis Soztutar added a comment -

        My concern with the patch is that it is acquiring yet another lock per get/scan on top of the already existing ones. Agreed that the region close lock is abused here for multi-CF bulkloads and have to be fixed.

        I believe the actual long term solution to this is to do ref-counting to Store files in the store, and have the store file list per scan immutable. Then we do not need the costly mechanism for keeping the store files updated between KVHea, scanner and store file list (notifyChangedReadersObservers). leveldb is doing ref counting for files I believe. Lars Hofhansl you had a jira for this?

        Show
        Enis Soztutar added a comment - My concern with the patch is that it is acquiring yet another lock per get/scan on top of the already existing ones. Agreed that the region close lock is abused here for multi-CF bulkloads and have to be fixed. I believe the actual long term solution to this is to do ref-counting to Store files in the store, and have the store file list per scan immutable. Then we do not need the costly mechanism for keeping the store files updated between KVHea, scanner and store file list ( notifyChangedReadersObservers ). leveldb is doing ref counting for files I believe. Lars Hofhansl you had a jira for this?
        Hide
        Enis Soztutar added a comment -

        I was reading HBASE-4552 and RegionScannerImpl code again to try to understand why we need the write lock for multi-CF bulk loads in the first place. It seems that it was put there to ensure atomicity, but that should be guaranteed with the seqId / mvcc combination and not via region write lock. However, the bulk load files obtain a seqId, and acquiring the region write lock will block all flushes which may be the reason. On bulk load, we call HStore.notifyChangedReadersObservers(), which resets the KVHeap, but we never reset the RegionScanner from my reading of code. Is this a bug? The current scanners should not see the new bulk loaded data (via mvcc) so maybe it is ok?

        Show
        Enis Soztutar added a comment - I was reading HBASE-4552 and RegionScannerImpl code again to try to understand why we need the write lock for multi-CF bulk loads in the first place. It seems that it was put there to ensure atomicity, but that should be guaranteed with the seqId / mvcc combination and not via region write lock. However, the bulk load files obtain a seqId, and acquiring the region write lock will block all flushes which may be the reason. On bulk load, we call HStore.notifyChangedReadersObservers(), which resets the KVHeap, but we never reset the RegionScanner from my reading of code. Is this a bug? The current scanners should not see the new bulk loaded data (via mvcc) so maybe it is ok?
        Hide
        Jerry He added a comment -

        but that should be guaranteed with the seqId / mvcc combination and not via region write lock.

        How? Would the following case be true without the bulk load getting the region write lock?
        a. the bulk load obtain a seqId
        b. a read request comes in and gets the seqId as mvcc.
        c. The read will be able to see the partially loaded data while the bulk is still in process

        In the 0.98 code line, we don't have seqid, and the atomicity is still guaranteed there.

        On bulk load, we call HStore.notifyChangedReadersObservers(), which resets the KVHeap, but we never reset the RegionScanner from my reading of code. Is this a bug?

        I think it is being propagated properly to the scanner. Think about the same notifyChangedReadersObservers is being used at the end of compaction and flushes as well. The reset of the readers should work.

        I think the region write lock is still the only guarantee for bulk load atomicity. On the high level, the region scan and next calls are within the region read lock, which is mutually elusive with bulk load process which needs the region write lock. This is heavy.

        Show
        Jerry He added a comment - but that should be guaranteed with the seqId / mvcc combination and not via region write lock. How? Would the following case be true without the bulk load getting the region write lock? a. the bulk load obtain a seqId b. a read request comes in and gets the seqId as mvcc. c. The read will be able to see the partially loaded data while the bulk is still in process In the 0.98 code line, we don't have seqid, and the atomicity is still guaranteed there. On bulk load, we call HStore.notifyChangedReadersObservers(), which resets the KVHeap, but we never reset the RegionScanner from my reading of code. Is this a bug? I think it is being propagated properly to the scanner. Think about the same notifyChangedReadersObservers is being used at the end of compaction and flushes as well. The reset of the readers should work. I think the region write lock is still the only guarantee for bulk load atomicity. On the high level, the region scan and next calls are within the region read lock, which is mutually elusive with bulk load process which needs the region write lock. This is heavy.
        Hide
        Nick Dimiduk added a comment -

        Atomicity may be a false blanket considering HBASE-4652 is still unresolved.

        Show
        Nick Dimiduk added a comment - Atomicity may be a false blanket considering HBASE-4652 is still unresolved.
        Hide
        Jerry He added a comment -

        You are right, Nick. That is the unsolved issue.

        Show
        Jerry He added a comment - You are right, Nick. That is the unsolved issue.
        Hide
        Enis Soztutar added a comment -

        How? Would the following case be true without the bulk load getting the region write lock?

        We do not do this now, but in theory it can be done similarly to regular writes.

        • obtain new seqId as a write transaction
        • bulk load all files across CFs with the seqId.
        • advance mvcc read point only when all bulk loads are complete.
          This way the scanners are guaranteed to atomically observe the bulk loaded data atomically without the region-write-lock.

          In the 0.98 code line, we don't have seqid, and the atomicity is still guaranteed there.

          Yes. Not worth changing 0.98 line.

          I think it is being propagated properly to the scanner. Think about the same notifyChangedReadersObservers is being used at the end of compaction and flushes as well. The reset of the readers should work.

          I am not sure about that. Agreed that the cells at the store level will actually get re-ordered, but the heap at the region level is never re-ordered. So, after a bulk load, the ordering of store scanners at the region level might change, but the scanner will miss it if I understand this correctly.

          Atomicity may be a false blanket considering HBASE-4652 is still unresolved.

          Very good point. We need a transactional commit for the BL files.

        Show
        Enis Soztutar added a comment - How? Would the following case be true without the bulk load getting the region write lock? We do not do this now, but in theory it can be done similarly to regular writes. obtain new seqId as a write transaction bulk load all files across CFs with the seqId. advance mvcc read point only when all bulk loads are complete. This way the scanners are guaranteed to atomically observe the bulk loaded data atomically without the region-write-lock. In the 0.98 code line, we don't have seqid, and the atomicity is still guaranteed there. Yes. Not worth changing 0.98 line. I think it is being propagated properly to the scanner. Think about the same notifyChangedReadersObservers is being used at the end of compaction and flushes as well. The reset of the readers should work. I am not sure about that. Agreed that the cells at the store level will actually get re-ordered, but the heap at the region level is never re-ordered. So, after a bulk load, the ordering of store scanners at the region level might change, but the scanner will miss it if I understand this correctly. Atomicity may be a false blanket considering HBASE-4652 is still unresolved. Very good point. We need a transactional commit for the BL files.
        Hide
        Esteban Gutierrez added a comment -

        Hey Qiang Tian are you still working on this?

        Also I agree with Enis Soztutar regarding ref-counting might be an alternative but I couldn't find the JIRA for that, any pointer Lars Hofhansl?

        Show
        Esteban Gutierrez added a comment - Hey Qiang Tian are you still working on this? Also I agree with Enis Soztutar regarding ref-counting might be an alternative but I couldn't find the JIRA for that, any pointer Lars Hofhansl ?
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        133d 2h 55m 1 Qiang Tian 28/Oct/14 06:48

          People

          • Assignee:
            Qiang Tian
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:

              Development