HBase
  1. HBase
  2. HBASE-5569

Do not collect deleted KVs when they are still in use by a scanner.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
      The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.


      What I pieced together so far is that it is the scanning side that has problems sometimes.

      Every time I see a assertion failure in the log I see this before:

      2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0

      The order of if the Put and Delete is sometimes reversed.

      The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.

      This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes

      1. TestAtomicOperation-output.trunk_120313.rar
        19 kB
        Nicolas Liochon
      2. 5569-v4.txt
        6 kB
        Lars Hofhansl
      3. 5569-v3.txt
        5 kB
        Lars Hofhansl
      4. 5569-v2.txt
        3 kB
        Lars Hofhansl
      5. 5569.txt
        0.7 kB
        Lars Hofhansl

        Issue Links

          Activity

          Hide
          Lars Hofhansl added a comment -

          Here's the other case.

          2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/56043/DeleteColumn/vlen=0,and after = rowB/colfamily11:qual1/54931/Put/vlen=6

          Locally I have not been able to reproduce this, yet.

          Show
          Lars Hofhansl added a comment - Here's the other case. 2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/56043/DeleteColumn/vlen=0,and after = rowB/colfamily11:qual1/54931/Put/vlen=6 Locally I have not been able to reproduce this, yet.
          Hide
          stack added a comment -

          This is the failures we saw up on builds.apache.org? There was a fail in hadoopqa too. You including that? Good on you Lars.

          Show
          stack added a comment - This is the failures we saw up on builds.apache.org? There was a fail in hadoopqa too. You including that? Good on you Lars.
          Hide
          stack added a comment -

          This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes

          Or is this a bug we've introduced recently?

          Show
          stack added a comment - This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes Or is this a bug we've introduced recently?
          Hide
          Lars Hofhansl added a comment -

          A bit more context:

          One case:

          2012-03-12 21:48:49,497 INFO [Thread-260] regionserver.Store(796): Added /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/target/test-data/e923fe0e-3b3e-4c67-89ec-4cac8c991955/TestIncrementtestMultiRowMutationMultiThreads/testtable/446f80b650aa093734c2dff4b9581ff8/colfamily11/e0930b6c478c4a5db9eceaead90bc80e, entries=7, sequenceid=75545, filesize=1.0k
          2012-03-12 21:48:49,522 INFO [Thread-260] regionserver.HRegion(1552): Finished memstore flush of ~87.4k/89544, currentsize=20.8k/21320 for region testtable,,1331588915162.446f80b650aa093734c2dff4b9581ff8. in 63ms, sequenceid=75545, compaction requested=true
          2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
          2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.TestAtomicOperation$2(390): []
          Exception in thread "Thread-211" junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48)
          at junit.framework.Assert.fail(Assert.java:56)
          at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392)

          Another case:

          2012-03-13 01:34:06,655 INFO [Thread-212] regionserver.Store(748): Flushed , sequenceid=56173, memsize=1.8k, into tmp file /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/.tmp/a0e1d5df9b5344c19ddbc7b11e0cd9d2
          2012-03-13 01:34:06,656 DEBUG [Thread-212] regionserver.Store(773): Renaming flushed file at /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/.tmp/a0e1d5df9b5344c19ddbc7b11e0cd9d2 to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/colfamily11/a0e1d5df9b5344c19ddbc7b11e0cd9d2
          2012-03-13 01:34:06,661 INFO [Thread-212] regionserver.Store(796): Added /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/colfamily11/a0e1d5df9b5344c19ddbc7b11e0cd9d2, entries=11, sequenceid=56173, filesize=1.2k
          2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/56043/DeleteColumn/vlen=0,and after = rowB/colfamily11:qual1/54931/Put/vlen=6
          2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.TestAtomicOperation$2(390): [rowA/colfamily11:qual1/56043/Put/vlen=6, rowB/colfamily11:qual1/54931/Put/vlen=6]
          Exception in thread "Thread-287" junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48)
          at junit.framework.Assert.fail(Assert.java:56)
          at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392)
          2012-03-13 01:34:06,675 INFO [Thread-212] regionserver.HRegion(1552): Finished memstore flush of ~380.5k/389664, currentsize=28.5k/29192 for region testtable,,1331602436835.00fad569500db871769b9d5951b3ed16. in 44ms, sequenceid=56173, compaction requested=true

          So it seems this is related to flushing (test test flushes frequently - 1/s - precisely to exercise this scenario)

          Show
          Lars Hofhansl added a comment - A bit more context: One case: 2012-03-12 21:48:49,497 INFO [Thread-260] regionserver.Store(796): Added /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/target/test-data/e923fe0e-3b3e-4c67-89ec-4cac8c991955/TestIncrementtestMultiRowMutationMultiThreads/testtable/446f80b650aa093734c2dff4b9581ff8/colfamily11/e0930b6c478c4a5db9eceaead90bc80e, entries=7, sequenceid=75545, filesize=1.0k 2012-03-12 21:48:49,522 INFO [Thread-260] regionserver.HRegion(1552): Finished memstore flush of ~87.4k/89544, currentsize=20.8k/21320 for region testtable,,1331588915162.446f80b650aa093734c2dff4b9581ff8. in 63ms, sequenceid=75545, compaction requested=true 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.TestAtomicOperation$2(390): [] Exception in thread "Thread-211" junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.fail(Assert.java:56) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392) Another case: 2012-03-13 01:34:06,655 INFO [Thread-212] regionserver.Store(748): Flushed , sequenceid=56173, memsize=1.8k, into tmp file /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/.tmp/a0e1d5df9b5344c19ddbc7b11e0cd9d2 2012-03-13 01:34:06,656 DEBUG [Thread-212] regionserver.Store(773): Renaming flushed file at /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/.tmp/a0e1d5df9b5344c19ddbc7b11e0cd9d2 to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/colfamily11/a0e1d5df9b5344c19ddbc7b11e0cd9d2 2012-03-13 01:34:06,661 INFO [Thread-212] regionserver.Store(796): Added /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/colfamily11/a0e1d5df9b5344c19ddbc7b11e0cd9d2, entries=11, sequenceid=56173, filesize=1.2k 2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/56043/DeleteColumn/vlen=0,and after = rowB/colfamily11:qual1/54931/Put/vlen=6 2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.TestAtomicOperation$2(390): [rowA/colfamily11:qual1/56043/Put/vlen=6, rowB/colfamily11:qual1/54931/Put/vlen=6] Exception in thread "Thread-287" junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.fail(Assert.java:56) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392) 2012-03-13 01:34:06,675 INFO [Thread-212] regionserver.HRegion(1552): Finished memstore flush of ~380.5k/389664, currentsize=28.5k/29192 for region testtable,,1331602436835.00fad569500db871769b9d5951b3ed16. in 44ms, sequenceid=56173, compaction requested=true So it seems this is related to flushing (test test flushes frequently - 1/s - precisely to exercise this scenario)
          Hide
          Lars Hofhansl added a comment -

          This is the failures we saw up on builds.apache.org? There was a fail in hadoopqa too. You including that?

          Yep those are the ones. I recall now that I have occasionally seen these before.

          Or is this a bug we've introduced recently?

          Possible, but I do not think that is likely.
          Maybe the test code is not valid?
          Or maybe there is more work to do for multi-row transactions and scanners do not yet see Puts and Deletes atomically across multiple rows...?

          Show
          Lars Hofhansl added a comment - This is the failures we saw up on builds.apache.org? There was a fail in hadoopqa too. You including that? Yep those are the ones. I recall now that I have occasionally seen these before. Or is this a bug we've introduced recently? Possible, but I do not think that is likely. Maybe the test code is not valid? Or maybe there is more work to do for multi-row transactions and scanners do not yet see Puts and Deletes atomically across multiple rows...?
          Hide
          Lars Hofhansl added a comment -

          I wonder if this has to do with HBASE-5568?
          I have multiple threads here that flush the same HRegion directly.

          Show
          Lars Hofhansl added a comment - I wonder if this has to do with HBASE-5568 ? I have multiple threads here that flush the same HRegion directly.
          Hide
          stack added a comment -

          Ugh. Indexing JIRA lost my comment.

          Looking at builds, we don't have much of a history on trunk builds but TestAtomicOperation started failing today when "HBASE-5399 Cut the link between the client and the zookeeper ensemble" went in (among others). I see over in hadoopqa builds that it doesn't fail if I go back twenty odd builds. It did break here, https://builds.apache.org/view/G-L/view/HBase/job/PreCommit-HBASE-Build/1168/, and on a later build. Should I try reverting it?

          Show
          stack added a comment - Ugh. Indexing JIRA lost my comment. Looking at builds, we don't have much of a history on trunk builds but TestAtomicOperation started failing today when " HBASE-5399 Cut the link between the client and the zookeeper ensemble" went in (among others). I see over in hadoopqa builds that it doesn't fail if I go back twenty odd builds. It did break here, https://builds.apache.org/view/G-L/view/HBase/job/PreCommit-HBASE-Build/1168/ , and on a later build. Should I try reverting it?
          Hide
          Lars Hofhansl added a comment -

          I'll run this in a loop on my work machine (8 core + hyperthreading), should increase the likelihood of this happening.
          Will then avoid the parallel flushing, and see of that fixes the problem.

          I think the test always had this problem. On the other I do think this indicates a problem with scanning.
          This is suspicious, and the code producing this was also added relatively recently:

          Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen

          Show
          Lars Hofhansl added a comment - I'll run this in a loop on my work machine (8 core + hyperthreading), should increase the likelihood of this happening. Will then avoid the parallel flushing, and see of that fixes the problem. I think the test always had this problem. On the other I do think this indicates a problem with scanning. This is suspicious, and the code producing this was also added relatively recently: Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen
          Hide
          Lars Hofhansl added a comment -

          I cannot make this test fail locally it seems. Running in a loop for an hour now (test takes ~12s on my machine).

          Show
          Lars Hofhansl added a comment - I cannot make this test fail locally it seems. Running in a loop for an hour now (test takes ~12s on my machine).
          Hide
          Lars Hofhansl added a comment -

          Are my assumptions about scanning wrong here?

          The test works as follows:
          A bunch of thread alternate putting a column on RowA and deleting that column on RowB in a transaction (next time delete is on RowA and put on RowB).
          The they each scan starting with RowA and then expect to always see exactly one KV (either the column in RowA or the one in RowB).

          So this relies on a scan providing an atomic view over the two rows (which is think should work if both RowA and RowB are rolled forward with the same MVCC writepoint).

          Show
          Lars Hofhansl added a comment - Are my assumptions about scanning wrong here? The test works as follows: A bunch of thread alternate putting a column on RowA and deleting that column on RowB in a transaction (next time delete is on RowA and put on RowB). The they each scan starting with RowA and then expect to always see exactly one KV (either the column in RowA or the one in RowB). So this relies on a scan providing an atomic view over the two rows (which is think should work if both RowA and RowB are rolled forward with the same MVCC writepoint).
          Hide
          Lars Hofhansl added a comment -

          Ok... Failed locally once now as well.

          Show
          Lars Hofhansl added a comment - Ok... Failed locally once now as well.
          Hide
          Lars Hofhansl added a comment -

          The check that issues the above DEBUG message was added as part of HBASE-5121.
          Interestingly that issue is only about major compactions, and this test does not have any major compactions, so maybe HBASE-5121 is incorrect?

          Show
          Lars Hofhansl added a comment - The check that issues the above DEBUG message was added as part of HBASE-5121 . Interestingly that issue is only about major compactions, and this test does not have any major compactions, so maybe HBASE-5121 is incorrect?
          Hide
          stack added a comment -

          hbase-5121 does mess w/ scanners... Seems like pretty issue though, what hbase-5121 is trying to solve. Pity its so hard verifying this started the failures else we could back it out for now. Should we back it out anyways and see if we get failures over the next few days?

          Show
          stack added a comment - hbase-5121 does mess w/ scanners... Seems like pretty issue though, what hbase-5121 is trying to solve. Pity its so hard verifying this started the failures else we could back it out for now. Should we back it out anyways and see if we get failures over the next few days?
          Hide
          Lars Hofhansl added a comment -

          I can try to back out HBASE-5121 and see if I can still get this fail.

          I do think my assumption about scanning were wrong, though. HBASE-5229 is still valid (in that it allows a bunch of operations across multiple rows either all fail or all succeed), just that there is currently no way to get a consistent scan over multiple rows when flushing is involved (which is OK, because the scanner contract never guaranteed that). If that is the case I should disable the test.

          TestAtomicOperation.testRowMutationMultiThreads basically does the same thing only within the same row, I have never seen that one fail.

          Show
          Lars Hofhansl added a comment - I can try to back out HBASE-5121 and see if I can still get this fail. I do think my assumption about scanning were wrong, though. HBASE-5229 is still valid (in that it allows a bunch of operations across multiple rows either all fail or all succeed), just that there is currently no way to get a consistent scan over multiple rows when flushing is involved (which is OK, because the scanner contract never guaranteed that). If that is the case I should disable the test. TestAtomicOperation.testRowMutationMultiThreads basically does the same thing only within the same row, I have never seen that one fail.
          Hide
          Nicolas Liochon added a comment -

          testRowMutationMultiThreads logs, on trunk as of today. It failed after 200 iterations.

          Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.007 sec <<< FAILURE!
          testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation)  Time elapsed: 8.651 sec  <<< FAILURE!
          junit.framework.AssertionFailedError: expected:<0> but was:<8>
          	at junit.framework.Assert.fail(Assert.java:50)
          	at junit.framework.Assert.failNotEquals(Assert.java:287)
          	at junit.framework.Assert.assertEquals(Assert.java:67)
          	at junit.framework.Assert.assertEquals(Assert.java:199)
          	at junit.framework.Assert.assertEquals(Assert.java:205)
          	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testRowMutationMultiThreads(TestAtomicOperation.java:331)
          
          Show
          Nicolas Liochon added a comment - testRowMutationMultiThreads logs, on trunk as of today. It failed after 200 iterations. Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.007 sec <<< FAILURE! testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation) Time elapsed: 8.651 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<0> but was:<8> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testRowMutationMultiThreads(TestAtomicOperation.java:331)
          Hide
          chunhui shen added a comment -

          The check that issues the above DEBUG message added as part of HBASE-5121.
          (Storescanner.peek() is changed。。。) will also may print after flush.

          This DEBUG message means StoreScanner.peek() is changed after a compaction or a flush or others who calls Store notifyChangedReadersObservers().

          Show
          chunhui shen added a comment - The check that issues the above DEBUG message added as part of HBASE-5121 . (Storescanner.peek() is changed。。。) will also may print after flush. This DEBUG message means StoreScanner.peek() is changed after a compaction or a flush or others who calls Store notifyChangedReadersObservers().
          Hide
          chunhui shen added a comment -

          I think it's may the test case's problem.

          Between the time thread1 execute

          region.mutateRowsWithLocks(mrm, rowsToLock);

          and

          Scan s = new Scan(row);
          RegionScanner rs = region.getScanner(s);
                        List<KeyValue> r = new ArrayList<KeyValue>();
                        while(rs.next(r));

          another thread2 may execute

          Put p = new Put(row2, ts);
                          p.add(fam1, qual1, value1);
                          mrm.add(p);
                          Delete d = new Delete(row);
                          d.deleteColumns(fam1, qual1, ts);
                          mrm.add(d);

          , and it will delete row,
          So thread1 may couldn't get any data.

          suggestion if uncorrect,thanks.

          Show
          chunhui shen added a comment - I think it's may the test case's problem. Between the time thread1 execute region.mutateRowsWithLocks(mrm, rowsToLock); and Scan s = new Scan(row); RegionScanner rs = region.getScanner(s); List<KeyValue> r = new ArrayList<KeyValue>(); while (rs.next(r)); another thread2 may execute Put p = new Put(row2, ts); p.add(fam1, qual1, value1); mrm.add(p); Delete d = new Delete(row); d.deleteColumns(fam1, qual1, ts); mrm.add(d); , and it will delete row, So thread1 may couldn't get any data. suggestion if uncorrect,thanks.
          Hide
          Ted Yu added a comment -

          @Chunhui:
          This makes sense.

          Looks like the test case can utilize HBASE-5515: Add a processRow API that supports atomic multiple reads and writes on a row

          Show
          Ted Yu added a comment - @Chunhui: This makes sense. Looks like the test case can utilize HBASE-5515 : Add a processRow API that supports atomic multiple reads and writes on a row
          Hide
          Lars Hofhansl added a comment - - edited

          Well... The whole point of the new API was to have atomic operations.
          The Put and the Delete are executed atomically together and visible at the same time.
          Note that the code alternates putting row and deleting row2, and then putting row2 and deleting row. The scan than ensure that only exactly one column is visible.

          In this case the scan itself is inconsistent. And worse, as Nicolas (N) found out is that even testRowMutationMultiThreads fails sometimes, and that is just a single row and should never happen.

          So I am not entirely convinced the test is at fault.

          For example the scenario described above:
          if

          Put p = new Put(row2, ts);
                          p.add(fam1, qual1, value1);
                          mrm.add(p);
                          Delete d = new Delete(row);
                          d.deleteColumns(fam1, qual1, ts);
                          mrm.add(d);
          

          happened between

          region.mutateRowsWithLocks(mrm, rowsToLock);
          

          and

          
          Scan s = new Scan(row);
          RegionScanner rs = region.getScanner(s);
                        List<KeyValue> r = new ArrayList<KeyValue>();
                        while(rs.next(r));
          

          Both the Put and the Delete would happen atomically with the same WALEdit and the same MVCC writepoint. So the scan will now see the other row (it sees either row or row, because row RowA sorts before row2 RowB)
          This has nothing to do with race conditions between threads, but only occurs with flushes in the test. I'll remove the forced flushes and then run the test again.

          Show
          Lars Hofhansl added a comment - - edited Well... The whole point of the new API was to have atomic operations. The Put and the Delete are executed atomically together and visible at the same time. Note that the code alternates putting row and deleting row2, and then putting row2 and deleting row. The scan than ensure that only exactly one column is visible. In this case the scan itself is inconsistent. And worse, as Nicolas (N) found out is that even testRowMutationMultiThreads fails sometimes, and that is just a single row and should never happen. So I am not entirely convinced the test is at fault. For example the scenario described above: if Put p = new Put(row2, ts); p.add(fam1, qual1, value1); mrm.add(p); Delete d = new Delete(row); d.deleteColumns(fam1, qual1, ts); mrm.add(d); happened between region.mutateRowsWithLocks(mrm, rowsToLock); and Scan s = new Scan(row); RegionScanner rs = region.getScanner(s); List<KeyValue> r = new ArrayList<KeyValue>(); while (rs.next(r)); Both the Put and the Delete would happen atomically with the same WALEdit and the same MVCC writepoint. So the scan will now see the other row (it sees either row or row, because row RowA sorts before row2 RowB ) This has nothing to do with race conditions between threads, but only occurs with flushes in the test. I'll remove the forced flushes and then run the test again.
          Hide
          Lars Hofhansl added a comment -

          Can't unpack the rar file (guess I need the non-free unrar package, and as a principle I do not install non-free software on my machines).
          What I really just need to know is whether there are messages like those in the description right before any assertion failures.

          Show
          Lars Hofhansl added a comment - Can't unpack the rar file (guess I need the non-free unrar package, and as a principle I do not install non-free software on my machines). What I really just need to know is whether there are messages like those in the description right before any assertion failures.
          Hide
          chunhui shen added a comment -

          @Lars
          Maybe I don't say clearly.

          We could consider the following scenario:

          Time 1,Thread 1, row is deleted and row2 is put, so now in the hbase, the real KV is only row2

          Time 2,Thread 1, do RegionScanner rs = region.getScanner(s);RS open the scanner, and ponit the next KV is row2

          Time 3,Thread 2, row2 is deleted and row is put,so now in the hbase, the real KV is only row

          Time 4,Thread 1 do while(rs.next(r)); because the scanner is pointing row2, however it is deleted now, so rs.next(r) will get nothing even if row is in the hbase.

          To fix this issue, we should do scanner.seek in scanner.next() rather than in construction of scanner.

          Show
          chunhui shen added a comment - @Lars Maybe I don't say clearly. We could consider the following scenario: Time 1,Thread 1, row is deleted and row2 is put, so now in the hbase, the real KV is only row2 Time 2,Thread 1, do RegionScanner rs = region.getScanner(s);RS open the scanner, and ponit the next KV is row2 Time 3,Thread 2, row2 is deleted and row is put,so now in the hbase, the real KV is only row Time 4,Thread 1 do while(rs.next(r)); because the scanner is pointing row2, however it is deleted now, so rs.next(r) will get nothing even if row is in the hbase. To fix this issue, we should do scanner.seek in scanner.next() rather than in construction of scanner.
          Hide
          Lars Hofhansl added a comment -

          Each scanner should only see KVs according to its mvcc readpoint.
          What you describe could also happen with KVs "inside" the same row, and the mvcc readpoint guards against this.

          Show
          Lars Hofhansl added a comment - Each scanner should only see KVs according to its mvcc readpoint. What you describe could also happen with KVs "inside" the same row, and the mvcc readpoint guards against this.
          Hide
          chunhui shen added a comment -

          Is there any possibility that region.flush break the rule: each scanner should only see KVs according to its mvcc readpoint.
          Because in current flush logic, KVs will be deleted when flushing if there is tag of delete type.

          Show
          chunhui shen added a comment - Is there any possibility that region.flush break the rule: each scanner should only see KVs according to its mvcc readpoint. Because in current flush logic, KVs will be deleted when flushing if there is tag of delete type.
          Hide
          Lars Hofhansl added a comment -

          But here's a thought. Unless KEEP_DELETED_CELLS is set to true for a store, a flush will unconditionally purge all deleted rows (I put that in that optimization myself )... That might be a hole in HBASE-2856, since this was never needed.
          HBASE-2856 delay expiration of KVs until all scans are finished, but it does not do this for deleted cells.

          I'm trying now with KEEP_DELETED_CELLS enabled to see if I can still reproduce this problem.

          Show
          Lars Hofhansl added a comment - But here's a thought. Unless KEEP_DELETED_CELLS is set to true for a store, a flush will unconditionally purge all deleted rows (I put that in that optimization myself )... That might be a hole in HBASE-2856 , since this was never needed. HBASE-2856 delay expiration of KVs until all scans are finished, but it does not do this for deleted cells. I'm trying now with KEEP_DELETED_CELLS enabled to see if I can still reproduce this problem.
          Hide
          Lars Hofhansl added a comment -

          Comment crossing... We had the same thought.

          Show
          Lars Hofhansl added a comment - Comment crossing... We had the same thought.
          Hide
          Lars Hofhansl added a comment - - edited

          But note that sometime the other case happens, and we see two rows!

          Show
          Lars Hofhansl added a comment - - edited But note that sometime the other case happens, and we see two rows!
          Hide
          Nicolas Liochon added a comment -

          There's no message on Storescanner.peek, nor error or warning. Here's the log when it fails:

          2012-03-14 03:14:02,146 DEBUG [Thread-51] regionserver.TestAtomicOperation$1(305): keyvalues=NONE
          Exception in thread "Thread-51" 
          junit.framework.AssertionFailedError
          	at junit.framework.Assert.fail(Assert.java:48)
          	at junit.framework.Assert.fail(Assert.java:56)
          	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$1.run(TestAtomicOperation.java:307)
          2012-03-14 03:14:02,228 DEBUG [Thread-92] regionserver.TestAtomicOperation$1(279): flushing
          

          Reproduced on Feb' 24th trunk as well, after ~700 iterations, same logs.

          Show
          Nicolas Liochon added a comment - There's no message on Storescanner.peek, nor error or warning. Here's the log when it fails: 2012-03-14 03:14:02,146 DEBUG [Thread-51] regionserver.TestAtomicOperation$1(305): keyvalues=NONE Exception in thread "Thread-51" junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.fail(Assert.java:56) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$1.run(TestAtomicOperation.java:307) 2012-03-14 03:14:02,228 DEBUG [Thread-92] regionserver.TestAtomicOperation$1(279): flushing Reproduced on Feb' 24th trunk as well, after ~700 iterations, same logs.
          Hide
          chunhui shen added a comment -

          Since mvcc readpoint guards against scanner see KVs out of its readpoint, why could we see two rows?

          Show
          chunhui shen added a comment - Since mvcc readpoint guards against scanner see KVs out of its readpoint, why could we see two rows?
          Hide
          Lars Hofhansl added a comment -

          That is the question

          Show
          Lars Hofhansl added a comment - That is the question
          Hide
          Lars Hofhansl added a comment -

          HBASE-2856 added logic about when KVs can be expired (either by version or TTL), it did not add this same logic for deleted rows (i.e. for deletes the rug can be pulled from under a scan).
          I added that (which ended up being a one line change once I understood what is going on). Running test in a loop now.

          Show
          Lars Hofhansl added a comment - HBASE-2856 added logic about when KVs can be expired (either by version or TTL), it did not add this same logic for deleted rows (i.e. for deletes the rug can be pulled from under a scan). I added that (which ended up being a one line change once I understood what is going on). Running test in a loop now.
          Hide
          Lars Hofhansl added a comment -

          Here's the patch. Still running tests in a loop, no failure, yet.
          Attaching here, so that I can get a HadoopQA run.

          Show
          Lars Hofhansl added a comment - Here's the patch. Still running tests in a loop, no failure, yet. Attaching here, so that I can get a HadoopQA run.
          Hide
          Ted Yu added a comment -

          +1 on patch.

          Show
          Ted Yu added a comment - +1 on patch.
          Hide
          Lars Hofhansl added a comment -

          I notice the test takes much longer to complete now. Before the change it was 11s, now it's about 90s.
          That might just be the nature of the test, as it deletes and put a lot, and the actual removal of the deleted KVs is delayed (just as it is for expired KVs).

          Show
          Lars Hofhansl added a comment - I notice the test takes much longer to complete now. Before the change it was 11s, now it's about 90s. That might just be the nature of the test, as it deletes and put a lot , and the actual removal of the deleted KVs is delayed (just as it is for expired KVs).
          Hide
          Lars Hofhansl added a comment -

          Specifically the test deletes a lot of KVs that are still part of a scan and hence cannot be removed, so I think this is ok as far as this test goes.

          Show
          Lars Hofhansl added a comment - Specifically the test deletes a lot of KVs that are still part of a scan and hence cannot be removed, so I think this is ok as far as this test goes.
          Hide
          Ted Yu added a comment -

          Currently the test is marked as medium test.
          Can we lower the number of threads in the test ?

              for (int i = 0; i < numThreads; i++) {
          
          Show
          Ted Yu added a comment - Currently the test is marked as medium test. Can we lower the number of threads in the test ? for ( int i = 0; i < numThreads; i++) {
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518357/5569.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518357/5569.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          @Ted: Probably. Or make it a large test.
          I'll leave the test running in a loop for the rest of the day before I conclude anything. There might just be lower concurrency now and hence the problem is less likely to see.
          BTW. On my machine at home the time went from 70s to 400s.

          I assume we'd see the same in a test with a CF with VERSIONS=1 and then we put and scan in parallel. After HBASE-2856 went in, these puts could not be collected at flush time as they are used in a scan, now with this change the same happens for deletes.

          Show
          Lars Hofhansl added a comment - @Ted: Probably. Or make it a large test. I'll leave the test running in a loop for the rest of the day before I conclude anything. There might just be lower concurrency now and hence the problem is less likely to see. BTW. On my machine at home the time went from 70s to 400s. I assume we'd see the same in a test with a CF with VERSIONS=1 and then we put and scan in parallel. After HBASE-2856 went in, these puts could not be collected at flush time as they are used in a scan, now with this change the same happens for deletes.
          Hide
          Lars Hofhansl added a comment -

          Test is still running in a loop, hasn't failed, yet.
          I'll do some more performance tests to make sure this is only slowed down when needed (I need when scans are being performed)

          Show
          Lars Hofhansl added a comment - Test is still running in a loop, hasn't failed, yet. I'll do some more performance tests to make sure this is only slowed down when needed (I need when scans are being performed)
          Hide
          Lars Hofhansl added a comment -

          If I remove the scanning from the tests the times are back to what it was before, suggesting that the extra work is due to keeping (and flushing) deleted cells that cannot be collected because they are part of a scan.

          I'm happy with this outcome, and I would like to commit this change.
          Ted +1'd, I'm +1 (obviously), but it wouldn't hurt to have another pair of eyes or two looking at this.

          Show
          Lars Hofhansl added a comment - If I remove the scanning from the tests the times are back to what it was before, suggesting that the extra work is due to keeping (and flushing) deleted cells that cannot be collected because they are part of a scan. I'm happy with this outcome, and I would like to commit this change. Ted +1'd, I'm +1 (obviously), but it wouldn't hurt to have another pair of eyes or two looking at this.
          Hide
          Lars Hofhansl added a comment -

          Same change.
          In addition reduced number of threads to 50 and number of iterations to 500 to bring test runtimes to about 15s (on my machine).

          Show
          Lars Hofhansl added a comment - Same change. In addition reduced number of threads to 50 and number of iterations to 500 to bring test runtimes to about 15s (on my machine).
          Hide
          Lars Hofhansl added a comment -

          Will run tests over night and commit tomorrow morning unless I see a test failure or I get any objections.

          Show
          Lars Hofhansl added a comment - Will run tests over night and commit tomorrow morning unless I see a test failure or I get any objections.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518404/5569-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518404/5569-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment - - edited

          Ran all tests in TestAtomicOperation more that 3000 times without a failure.

          Show
          Lars Hofhansl added a comment - - edited Ran all tests in TestAtomicOperation more that 3000 times without a failure.
          Hide
          Lars Hofhansl added a comment -

          3500 test runs, no failures. Going to commit if nobody objects.

          Show
          Lars Hofhansl added a comment - 3500 test runs, no failures. Going to commit if nobody objects.
          Hide
          Lars Hofhansl added a comment -

          Committed to 0.94 and trunk

          Show
          Lars Hofhansl added a comment - Committed to 0.94 and trunk
          Hide
          stack added a comment -

          @Lars Good one.

          Show
          stack added a comment - @Lars Good one.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #32 (See https://builds.apache.org/job/HBase-0.94/32/)
          HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301138)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #32 (See https://builds.apache.org/job/HBase-0.94/32/ ) HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301138) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2683 (See https://builds.apache.org/job/HBase-TRUNK/2683/)
          HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301135)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2683 (See https://builds.apache.org/job/HBase-TRUNK/2683/ ) HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301135) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #139 (See https://builds.apache.org/job/HBase-TRUNK-security/139/)
          HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301135)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #139 (See https://builds.apache.org/job/HBase-TRUNK-security/139/ ) HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301135) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Hide
          Nicolas Liochon added a comment -

          fwiw, I still have the error on testRowMutationMultiThreads, after a few hundreds iterations... Same logs as above.

          Show
          Nicolas Liochon added a comment - fwiw, I still have the error on testRowMutationMultiThreads, after a few hundreds iterations... Same logs as above.
          Hide
          Lars Hofhansl added a comment -

          Hmm... I ran the tests (all of them - including testRowMutationMultiThreads) over 4000 times, didn't fail.
          testMultiRowMutationMultiThreads is definitely fixed (failed after a few dozen executions before).

          There might be yet another much rarer problem with testRowMutationMultiThreads. I've never seen it fail on the build machines, yet.

          Any chance you could attach the latest logs (as zip or tar)?

          Btw, this:

          2012-03-14 03:14:02,146 DEBUG [Thread-51] regionserver.TestAtomicOperation$1(305): keyvalues=NONE
          Exception in thread "Thread-51" 
          junit.framework.AssertionFailedError
          	at junit.framework.Assert.fail(Assert.java:48)
          	at junit.framework.Assert.fail(Assert.java:56)
          	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$1.run(TestAtomicOperation.java:307)
          2012-03-14 03:14:02,228 DEBUG [Thread-92] regionserver.TestAtomicOperation$1(279): flushing
          

          Is just when the test detects the problem. The actual problem should be in the logs some time before that.

          Show
          Lars Hofhansl added a comment - Hmm... I ran the tests (all of them - including testRowMutationMultiThreads) over 4000 times, didn't fail. testMultiRowMutationMultiThreads is definitely fixed (failed after a few dozen executions before). There might be yet another much rarer problem with testRowMutationMultiThreads. I've never seen it fail on the build machines, yet. Any chance you could attach the latest logs (as zip or tar)? Btw, this: 2012-03-14 03:14:02,146 DEBUG [ Thread -51] regionserver.TestAtomicOperation$1(305): keyvalues=NONE Exception in thread " Thread -51" junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.fail(Assert.java:56) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$1.run(TestAtomicOperation.java:307) 2012-03-14 03:14:02,228 DEBUG [ Thread -92] regionserver.TestAtomicOperation$1(279): flushing Is just when the test detects the problem. The actual problem should be in the logs some time before that.
          Hide
          Lars Hofhansl added a comment -

          Ran testRowMutationMultiThreads another 1000 times on my work machine without any failures.
          Then I ran it at home (much slower machine - but fast SSD) and saw a failure indeed pretty quickly. Hmm...

          Show
          Lars Hofhansl added a comment - Ran testRowMutationMultiThreads another 1000 times on my work machine without any failures. Then I ran it at home (much slower machine - but fast SSD) and saw a failure indeed pretty quickly. Hmm...
          Hide
          Lars Hofhansl added a comment -

          Interestingly I saw this right before:

          2012-03-16 19:24:30,523 DEBUG [Thread-46] regionserver.StoreScanner(499): Stores
          canner.peek() is changed where before = rowA/colfamily11:qual1/2561/DeleteColumn
          /vlen=0,and after = rowA/colfamily11:qual1/2561/DeleteColumn/vlen=0

          Which makes no sense, because before and after are the same KV.

          Show
          Lars Hofhansl added a comment - Interestingly I saw this right before: 2012-03-16 19:24:30,523 DEBUG [Thread-46] regionserver.StoreScanner(499): Stores canner.peek() is changed where before = rowA/colfamily11:qual1/2561/DeleteColumn /vlen=0,and after = rowA/colfamily11:qual1/2561/DeleteColumn/vlen=0 Which makes no sense, because before and after are the same KV.
          Hide
          Lars Hofhansl added a comment -

          So I suppose this can happen when the two deletes differ only by memstoreTS.
          This is a different problem from I fixed in this issue.

          Show
          Lars Hofhansl added a comment - So I suppose this can happen when the two deletes differ only by memstoreTS. This is a different problem from I fixed in this issue.
          Hide
          Lars Hofhansl added a comment -

          Added some extra logging. Turns out that the after KV always has memstoreTS=0.
          I have to conclude that this is not fixed, yet.

          Show
          Lars Hofhansl added a comment - Added some extra logging. Turns out that the after KV always has memstoreTS=0. I have to conclude that this is not fixed, yet.
          Hide
          Lars Hofhansl added a comment -

          This must be some strange timing issue since it never happens on my fast work machine.
          I think I'll revert the change until I understand this better.

          Show
          Lars Hofhansl added a comment - This must be some strange timing issue since it never happens on my fast work machine. I think I'll revert the change until I understand this better.
          Hide
          Lars Hofhansl added a comment -

          Reverted from 0.94 and trunk. Sigh.
          A few more details:

          • This has definitely something to do with StoreScanner. {checkReseek|resetScannerStack}

            .

          • I always see the DEBUG message about the StoreScanner.peek being changed.
          • Removing the code for HBASE-5121 does not fix this problem.
          • This is not related to HBASE-5568.
          • The new KV on the heap is always older than the existing one (so the scanner is going backwards in this case)! In this test the client threads assign the timestamps, so one of them might just fall behind.
          • The new KV on the head always has memstoreTS=0.
          • Either the new or the old KV is a delete marker (but that might be because of the nature of this test).
          • Both testRowMutationMultiThreads and testMultiRowMutationMultiThreads have the same problem. So this happens even for Puts and Deletes for the same Row, even when they written with the same mvcc writenumber and the in the same WALEdit.

          I'll see if I can write a more deterministic test for this.

          Show
          Lars Hofhansl added a comment - Reverted from 0.94 and trunk. Sigh. A few more details: This has definitely something to do with StoreScanner. {checkReseek|resetScannerStack} . I always see the DEBUG message about the StoreScanner.peek being changed. Removing the code for HBASE-5121 does not fix this problem. This is not related to HBASE-5568 . The new KV on the heap is always older than the existing one (so the scanner is going backwards in this case)! In this test the client threads assign the timestamps, so one of them might just fall behind. The new KV on the head always has memstoreTS=0. Either the new or the old KV is a delete marker (but that might be because of the nature of this test). Both testRowMutationMultiThreads and testMultiRowMutationMultiThreads have the same problem. So this happens even for Puts and Deletes for the same Row, even when they written with the same mvcc writenumber and the in the same WALEdit. I'll see if I can write a more deterministic test for this.
          Hide
          Lars Hofhansl added a comment -

          One last point: This seems to be extremely sensitive to the machine it is running on.
          Among the various loops I ran on my work machine I ran the test close to 10000 times and have not observed a single failure on that machine (with my changes applied), while on my home machine this is relatively easy to reproduce.

          Show
          Lars Hofhansl added a comment - One last point: This seems to be extremely sensitive to the machine it is running on. Among the various loops I ran on my work machine I ran the test close to 10000 times and have not observed a single failure on that machine (with my changes applied), while on my home machine this is relatively easy to reproduce.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #38 (See https://builds.apache.org/job/HBase-0.94/38/)
          Revert HBASE-5569 (Revision 1301873)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #38 (See https://builds.apache.org/job/HBase-0.94/38/ ) Revert HBASE-5569 (Revision 1301873) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #141 (See https://builds.apache.org/job/HBase-TRUNK-security/141/)
          Revert HBASE-5569 (Revision 1301872)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #141 (See https://builds.apache.org/job/HBase-TRUNK-security/141/ ) Revert HBASE-5569 (Revision 1301872) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Hide
          Lars Hofhansl added a comment -

          I spent a lot of more time looking at this. I thought it might be due to the flushes being executed in parallel by multiple threads, but synchronizing this part made the failure more likely!
          Doing this and increasing the frequency of flushes reproduces the problem multiple times on every test run now, which it good.

          But... My initial hunch was correct. When I enable KEEP_DELETED_CELLS on the store the problem goes away!
          Hence this definitely has to do with collection of deletes and delete markers.

          Show
          Lars Hofhansl added a comment - I spent a lot of more time looking at this. I thought it might be due to the flushes being executed in parallel by multiple threads, but synchronizing this part made the failure more likely! Doing this and increasing the frequency of flushes reproduces the problem multiple times on every test run now, which it good. But... My initial hunch was correct. When I enable KEEP_DELETED_CELLS on the store the problem goes away! Hence this definitely has to do with collection of deletes and delete markers.
          Hide
          Lars Hofhansl added a comment -

          Here's my theory...
          In ScanQueryMatcher we have this:

          byte type = kv.getType();
          if (kv.isDelete()) {
            if (!keepDeletedCells) {
              ...
              this.deletes.add(bytes, offset, qualLength, timestamp, type);
            }
            ...
          } else if (!this.deletes.isEmpty()) {
            DeleteResult deleteResult = deletes.isDeleted(bytes, offset, qualLength,
                timestamp);
            ...
          }
          

          And in StoreScanner.resetScannerStack

          // Reset the state of the Query Matcher and set to top row.
          // Only reset and call setRow if the row changes; avoids confusing the
          // query matcher if scanning intra-row.
          ...
          if ((matcher.row == null) || !kv.matchingRow(matcher.row)) {
            matcher.reset();
            matcher.setRow(kv.getRow());
          }
          

          So, the SQM might already have a delete registered, or might miss a delete.
          With KEEP_DELETED_CELLS that race does not happen, because deletes are simply not registered.

          Show
          Lars Hofhansl added a comment - Here's my theory... In ScanQueryMatcher we have this: byte type = kv.getType(); if (kv.isDelete()) { if (!keepDeletedCells) { ... this .deletes.add(bytes, offset, qualLength, timestamp, type); } ... } else if (! this .deletes.isEmpty()) { DeleteResult deleteResult = deletes.isDeleted(bytes, offset, qualLength, timestamp); ... } And in StoreScanner.resetScannerStack // Reset the state of the Query Matcher and set to top row. // Only reset and call setRow if the row changes; avoids confusing the // query matcher if scanning intra-row. ... if ((matcher.row == null ) || !kv.matchingRow(matcher.row)) { matcher.reset(); matcher.setRow(kv.getRow()); } So, the SQM might already have a delete registered, or might miss a delete. With KEEP_DELETED_CELLS that race does not happen, because deletes are simply not registered.
          Hide
          Lars Hofhansl added a comment -

          This

          if (includeDeleteMarker
              && kv.getMemstoreTS() <= maxReadPointToTrackVersions) {
             this.deletes.add(bytes, offset, qualLength, timestamp, type);
          }
          

          Fixes the issue. Note that maxReadPointToTrackVersions is actually the minimum readpoint of any scanner still operating in the region and it is only set during compaction.
          I think this correct because of the following:
          All delete markers precede the KVs they affect. So by not adding the delete marker it is guarantees that no KVs will be removed during flush that might still be in use. It also removes this race condition between scanner and flushes.

          So my previous fix was almost correct (in thought at least). I had believed it to be correct, because I had not been able - not even a single time - to reproduce this on my work machine.
          I'll attach a patch soon.

          Show
          Lars Hofhansl added a comment - This if (includeDeleteMarker && kv.getMemstoreTS() <= maxReadPointToTrackVersions) { this .deletes.add(bytes, offset, qualLength, timestamp, type); } Fixes the issue. Note that maxReadPointToTrackVersions is actually the minimum readpoint of any scanner still operating in the region and it is only set during compaction. I think this correct because of the following: All delete markers precede the KVs they affect. So by not adding the delete marker it is guarantees that no KVs will be removed during flush that might still be in use. It also removes this race condition between scanner and flushes. So my previous fix was almost correct (in thought at least). I had believed it to be correct, because I had not been able - not even a single time - to reproduce this on my work machine. I'll attach a patch soon.
          Hide
          Lars Hofhansl added a comment -

          New patch.
          Also adds code to show the memstoreTS in KV.toString.
          The number of loops on the TestAtomicOperation was reduced and the number of a flushes increased.

          Please have a careful look.
          If possible if some other folks could run TestAtomicOperation in a loop for a while that would be very helpful (considering that this problem did not occur at all on my work machine).

          Show
          Lars Hofhansl added a comment - New patch. Also adds code to show the memstoreTS in KV.toString. The number of loops on the TestAtomicOperation was reduced and the number of a flushes increased. Please have a careful look. If possible if some other folks could run TestAtomicOperation in a loop for a while that would be very helpful (considering that this problem did not occur at all on my work machine).
          Hide
          Lars Hofhansl added a comment -

          Getting a test run.

          Show
          Lars Hofhansl added a comment - Getting a test run.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518850/5569-v3.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestCompaction
          org.apache.hadoop.hbase.TestKeyValue

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518850/5569-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestCompaction org.apache.hadoop.hbase.TestKeyValue Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//console This message is automatically generated.
          Hide
          stack added a comment -

          Nice work Lars. Will review/test tomorrow.

          Show
          stack added a comment - Nice work Lars. Will review/test tomorrow.
          Hide
          Lars Hofhansl added a comment -

          Thanks Stack.
          TestKeyValue is a simple fix (because I changed the output of KV.toString()).
          TestCompaction looks worrisome, checking it out now.

          Show
          Lars Hofhansl added a comment - Thanks Stack. TestKeyValue is a simple fix (because I changed the output of KV.toString()). TestCompaction looks worrisome, checking it out now.
          Hide
          Nicolas Liochon added a comment -

          I've got the testRowMutationMultiThreads running currently on the patch v3. No issue so far. I will make it run 5000 times, previously it always failed before 1000 iterations.

          Show
          Nicolas Liochon added a comment - I've got the testRowMutationMultiThreads running currently on the patch v3. No issue so far. I will make it run 5000 times, previously it always failed before 1000 iterations.
          Hide
          Lars Hofhansl added a comment -

          TestCompaction.testMajorCompactingToNoOutput fails because the first scanner in the test was not closed, then the compaction was done. Hence the compaction could not remove the deleted rows, because a scanner is still (potentially) using them.

          The test is easily fixed (need to close the first scanner), but we need to think about whether this is the design we want.
          This is the same behavior we have with HBASE-2856 for expired rows (TTL or too many version): If a scanner is open with an earlier readpoint these will not be collected.

          Show
          Lars Hofhansl added a comment - TestCompaction.testMajorCompactingToNoOutput fails because the first scanner in the test was not closed, then the compaction was done. Hence the compaction could not remove the deleted rows, because a scanner is still (potentially) using them. The test is easily fixed (need to close the first scanner), but we need to think about whether this is the design we want. This is the same behavior we have with HBASE-2856 for expired rows (TTL or too many version): If a scanner is open with an earlier readpoint these will not be collected.
          Hide
          Lars Hofhansl added a comment -

          Thanks N.!

          Show
          Lars Hofhansl added a comment - Thanks N.!
          Hide
          Lars Hofhansl added a comment -

          Same patch with fixes for TestKeyValue and TestCompaction.

          Show
          Lars Hofhansl added a comment - Same patch with fixes for TestKeyValue and TestCompaction.
          Hide
          Lars Hofhansl added a comment -

          @N.: Note that the patch also reduces the number of threads for the test[Multi]RowMutationMultipleThreads and increases the rate of flushes per thread.
          This made it (far) more likely on my home machine to fail, might be different on your machine.

          I should note that on my home machine both test fail every time now on my home machine, but do not with this patch.

          Show
          Lars Hofhansl added a comment - @N.: Note that the patch also reduces the number of threads for the test [Multi] RowMutationMultipleThreads and increases the rate of flushes per thread. This made it (far) more likely on my home machine to fail, might be different on your machine. I should note that on my home machine both test fail every time now on my home machine, but do not with this patch.
          Hide
          Nicolas Liochon added a comment -

          Right now it's still running well. I'm doing the test on a small server, with a 4 core Intel Xeon E3-1220.

          Show
          Nicolas Liochon added a comment - Right now it's still running well. I'm doing the test on a small server, with a 4 core Intel Xeon E3-1220.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518856/5569-v4.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518856/5569-v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          I stopped it after 2700 iterations (10 hours), no error => patch seems to be fix the issue...

          Show
          Nicolas Liochon added a comment - I stopped it after 2700 iterations (10 hours), no error => patch seems to be fix the issue...
          Hide
          Lars Hofhansl added a comment -

          Thanks N. Good news! Tests pass too. I'm going to wait for some other folks to test on their machines to be extra sure this time.

          I need to be extra clear here:
          This patch will prevent any deleted KVs from being collected upon flush or compaction if there is a scanner open with a readpoint smaller than the KV's memstoreTS (HBASE-2856 does the same for expired KVs).
          Furthermore this is only needed for mixed delete and put operations, although it will generally prevent a flush/compaction from pulling the rug under a scanner.

          Personally, I think this is an important fix. However, I want to mention that the alternative is to remove the mutateRows functionality (obviously not my favorite choice), or to document that it only works with KEEP_DELETED_CELLS enabled (also not my favorite outcome).

          Show
          Lars Hofhansl added a comment - Thanks N. Good news! Tests pass too. I'm going to wait for some other folks to test on their machines to be extra sure this time. I need to be extra clear here: This patch will prevent any deleted KVs from being collected upon flush or compaction if there is a scanner open with a readpoint smaller than the KV's memstoreTS ( HBASE-2856 does the same for expired KVs). Furthermore this is only needed for mixed delete and put operations, although it will generally prevent a flush/compaction from pulling the rug under a scanner. Personally, I think this is an important fix. However, I want to mention that the alternative is to remove the mutateRows functionality (obviously not my favorite choice), or to document that it only works with KEEP_DELETED_CELLS enabled (also not my favorite outcome).
          Hide
          Lars Hofhansl added a comment -

          If some more folks would run the tests in a loop with the patch applied that'd be of great help.

          Show
          Lars Hofhansl added a comment - If some more folks would run the tests in a loop with the patch applied that'd be of great help.
          Hide
          Lars Hofhansl added a comment -

          Ran more variations of the test (different number of threads, loops, synchronized flushing or not). Each time I see a failure after 2-3 runs without the patch, and no failures with the patch after at least 20 iterations.

          Show
          Lars Hofhansl added a comment - Ran more variations of the test (different number of threads, loops, synchronized flushing or not). Each time I see a failure after 2-3 runs without the patch, and no failures with the patch after at least 20 iterations.
          Hide
          Lars Hofhansl added a comment -

          One more +1 anyone?
          I think this is an important feature.

          Show
          Lars Hofhansl added a comment - One more +1 anyone? I think this is an important feature.
          Hide
          Ted Yu added a comment -

          +1 from me.

          Show
          Ted Yu added a comment - +1 from me.
          Hide
          Lars Hofhansl added a comment -

          Thanks. Going to commit soon.
          @Stack: wanna have a quick look (also at my comment from 19/Mar/12 15:50)?

          Show
          Lars Hofhansl added a comment - Thanks. Going to commit soon. @Stack: wanna have a quick look (also at my comment from 19/Mar/12 15:50)?
          Hide
          stack added a comment -

          Your 'being extra clear' note needs to become the release note.

          What does this mean?

          This patch will prevent any deleted KVs from being collected upon flush or compaction if there is a scanner open with a readpoint smaller than the KV's memstoreTS (HBASE-2856 does the same for expired KVs).

          They stay in memstore or in the snapshot or rather, they are attached to the outstanding scanners or rather, we still 'see' them in files or memstores if outstanding scanners and delete is newer than the scanner read point.

          Patch looks fine – makes sense even – but I'm not up on subtleties that abound in this code.

          We didn't output ts in toString KV? Thats odd.

          +1

          Show
          stack added a comment - Your 'being extra clear' note needs to become the release note. What does this mean? This patch will prevent any deleted KVs from being collected upon flush or compaction if there is a scanner open with a readpoint smaller than the KV's memstoreTS ( HBASE-2856 does the same for expired KVs). They stay in memstore or in the snapshot or rather, they are attached to the outstanding scanners or rather, we still 'see' them in files or memstores if outstanding scanners and delete is newer than the scanner read point. Patch looks fine – makes sense even – but I'm not up on subtleties that abound in this code. We didn't output ts in toString KV? Thats odd. +1
          Hide
          Lars Hofhansl added a comment -

          @Stack:
          So HBASE-2856 has logic that prevents KVs from being removed in a flush or compaction when it has expired (due to TTL or too many version) but there is still a scanner open with a readpoint <= the KV's memstoreTS. (which means these KVs were created after the scanner was opened)
          Say you have set your store set 3 versions. Now you create 10 versions of a KV, the extra 7 versions are not removed during a flush or compaction when a scanner that was opened before the KVs were created.
          This patch adds the same for deleted KVs (IMHO that is something that HBASE-2856 missed). So now expired and deleted KVs are not collected if a scanner could still access them.

          It means that a flush or compaction needs to copy these KVs to the new store file instead of skipping them. This only happens for KVs that were created (or now deleted) after the scanner(s) were openened.

          The output I added is the memstoreTS. The ts is already part toString.

          I don't think that needs to be part of the release notes (at least we did not add this to the release notes for HBASE-2856 or its backport).

          Show
          Lars Hofhansl added a comment - @Stack: So HBASE-2856 has logic that prevents KVs from being removed in a flush or compaction when it has expired (due to TTL or too many version) but there is still a scanner open with a readpoint <= the KV's memstoreTS. (which means these KVs were created after the scanner was opened) Say you have set your store set 3 versions. Now you create 10 versions of a KV, the extra 7 versions are not removed during a flush or compaction when a scanner that was opened before the KVs were created. This patch adds the same for deleted KVs (IMHO that is something that HBASE-2856 missed). So now expired and deleted KVs are not collected if a scanner could still access them. It means that a flush or compaction needs to copy these KVs to the new store file instead of skipping them. This only happens for KVs that were created (or now deleted) after the scanner(s) were openened. The output I added is the memstoreTS. The ts is already part toString. I don't think that needs to be part of the release notes (at least we did not add this to the release notes for HBASE-2856 or its backport).
          Hide
          stack added a comment -

          Ok. Thanks for explaination. +1 on commit.

          Show
          stack added a comment - Ok. Thanks for explaination. +1 on commit.
          Hide
          Lars Hofhansl added a comment -

          Committed to 0.94 and trunk. Pheeww.

          Show
          Lars Hofhansl added a comment - Committed to 0.94 and trunk. Pheeww.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #43 (See https://builds.apache.org/job/HBase-0.94/43/)
          HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303222)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #43 (See https://builds.apache.org/job/HBase-0.94/43/ ) HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303222) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #144 (See https://builds.apache.org/job/HBase-TRUNK-security/144/)
          HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303220)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #144 (See https://builds.apache.org/job/HBase-TRUNK-security/144/ ) HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303220) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2689 (See https://builds.apache.org/job/HBase-TRUNK/2689/)
          HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303220)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2689 (See https://builds.apache.org/job/HBase-TRUNK/2689/ ) HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303220) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

            People

            • Assignee:
              Lars Hofhansl
              Reporter:
              Lars Hofhansl
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development