Uploaded image for project: 'Tephra'
  1. Tephra
  2. TEPHRA-253

TransactionProcessorTest is sometimes flaky

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0-incubating
    • Fix Version/s: 0.13.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      The test sometimes fails as follows:

      Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
      Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< FAILURE!
      testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)  Time elapsed: 1.526 sec
      testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)  Time elapsed: 0.053 sec
      testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)  Time elapsed: 0.288 sec  <<< FAILURE!
      org.junit.internal.ArrayComparisonFailure: arrays first differed at element [3]; expected:<4> but was:<1>
      	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
      	at org.junit.Assert.internalArrayEquals(Assert.java:473)
      	at org.junit.Assert.assertArrayEquals(Assert.java:294)
      	at org.junit.Assert.assertArrayEquals(Assert.java:305)
      	at org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
      	at org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
      	at org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
      

      It is not clear what is causing this, most likely the region server did not have an up-to-date transaction state snapshot at the time of the lfush (that might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where flush() has no effect because the region is already flushing,

      Let's observe this and gather more information when/if it happens again.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/incubator-tephra/pull/54

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-tephra/pull/54
          Hide
          anew Andreas Neumann added a comment -

          Fix is to wait with the flush until the transaction state is loaded.
          PR: https://github.com/apache/incubator-tephra/pull/54

          Show
          anew Andreas Neumann added a comment - Fix is to wait with the flush until the transaction state is loaded. PR: https://github.com/apache/incubator-tephra/pull/54
          Hide
          anew Andreas Neumann added a comment -

          Suspicion confirmed. After changing travis to dump the standard output of the test case, I see:

          2017-09-09 19:18:16,851 - INFO  [main:o.a.h.h.r.RegionCoprocessorHost@196] - Load coprocessor org.apache.tephra.hbase.coprocessor.TransactionProcessor from HTD of TestRegionScanner successfully.
          2017-09-09 19:18:16,868 - INFO  [StoreOpener-fc704aec719b675f06e5d7bd12da85f0-1:o.a.h.h.r.c.CompactionConfiguration@85] - size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; delete expired; major period 604800000, major jitter 0.500000
          2017-09-09 19:18:16,883 - INFO  [main:o.a.h.h.r.HRegion@644] - Onlined fc704aec719b675f06e5d7bd12da85f0; next sequenceid=1
          2017-09-09 19:18:16,883 - INFO  [main:o.a.t.h.c.TransactionProcessorTest@178] - Coprocessor is using transaction state: null
          2017-09-09 19:18:16,926 - INFO  [main:o.a.t.h.c.TransactionProcessorTest@192] - Flushing region TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0.
          2017-09-09 19:18:16,960 - INFO  [HDFSTransactionStateStorage STARTING:o.a.t.p.HDFSTransactionStateStorage@109] - Using snapshot dir /home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit8165179254738335598
          2017-09-09 19:18:16,981 - INFO  [TransactionStateCache STARTING:o.a.t.p.HDFSTransactionStateStorage@185] - Read encoded transaction snapshot of 84 bytes
          2017-09-09 19:18:16,984 - INFO  [TransactionStateCache STARTING:o.a.t.c.TransactionStateCache@166] - Transaction state reloaded with snapshot from 1504984695267
          2017-09-09 19:18:17,393 - INFO  [main:o.a.h.h.r.DefaultStoreFlusher@88] - Flushed, sequenceid=37, memsize=5.9 K, hasBloomFilter=true, into tmp file hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/.tmp/6e813e3b7af94e13afc9dc1303dda3f8
          2017-09-09 19:18:17,415 - INFO  [main:o.a.h.h.r.HStore@770] - Added hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/f/6e813e3b7af94e13afc9dc1303dda3f8, entries=36, sequenceid=37, filesize=2.2 K
          2017-09-09 19:18:17,416 - INFO  [main:o.a.h.h.r.HRegion@1708] - Finished memstore flush of ~5.9 K/6048, currentsize=0/0 for region TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0. in 489ms, sequenceid=37, compaction requested=false
          

          Clearly, the flush begins before the transaction state is loaded.

          Show
          anew Andreas Neumann added a comment - Suspicion confirmed. After changing travis to dump the standard output of the test case, I see: 2017-09-09 19:18:16,851 - INFO [main:o.a.h.h.r.RegionCoprocessorHost@196] - Load coprocessor org.apache.tephra.hbase.coprocessor.TransactionProcessor from HTD of TestRegionScanner successfully. 2017-09-09 19:18:16,868 - INFO [StoreOpener-fc704aec719b675f06e5d7bd12da85f0-1:o.a.h.h.r.c.CompactionConfiguration@85] - size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; delete expired; major period 604800000, major jitter 0.500000 2017-09-09 19:18:16,883 - INFO [main:o.a.h.h.r.HRegion@644] - Onlined fc704aec719b675f06e5d7bd12da85f0; next sequenceid=1 2017-09-09 19:18:16,883 - INFO [main:o.a.t.h.c.TransactionProcessorTest@178] - Coprocessor is using transaction state: null 2017-09-09 19:18:16,926 - INFO [main:o.a.t.h.c.TransactionProcessorTest@192] - Flushing region TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0. 2017-09-09 19:18:16,960 - INFO [HDFSTransactionStateStorage STARTING:o.a.t.p.HDFSTransactionStateStorage@109] - Using snapshot dir /home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit8165179254738335598 2017-09-09 19:18:16,981 - INFO [TransactionStateCache STARTING:o.a.t.p.HDFSTransactionStateStorage@185] - Read encoded transaction snapshot of 84 bytes 2017-09-09 19:18:16,984 - INFO [TransactionStateCache STARTING:o.a.t.c.TransactionStateCache@166] - Transaction state reloaded with snapshot from 1504984695267 2017-09-09 19:18:17,393 - INFO [main:o.a.h.h.r.DefaultStoreFlusher@88] - Flushed, sequenceid=37, memsize=5.9 K, hasBloomFilter=true, into tmp file hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/.tmp/6e813e3b7af94e13afc9dc1303dda3f8 2017-09-09 19:18:17,415 - INFO [main:o.a.h.h.r.HStore@770] - Added hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/f/6e813e3b7af94e13afc9dc1303dda3f8, entries=36, sequenceid=37, filesize=2.2 K 2017-09-09 19:18:17,416 - INFO [main:o.a.h.h.r.HRegion@1708] - Finished memstore flush of ~5.9 K/6048, currentsize=0/0 for region TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0. in 489ms, sequenceid=37, compaction requested=false Clearly, the flush begins before the transaction state is loaded.
          Hide
          anew Andreas Neumann added a comment -

          I suspect that this test is flaky because we call flush on the region very quickly after creating it. The coprocessor may not have read the transaction state at that time. Hence it would not remove invalid transactions recorded in that transaction state, and that fails the test.

          Show
          anew Andreas Neumann added a comment - I suspect that this test is flaky because we call flush on the region very quickly after creating it. The coprocessor may not have read the transaction state at that time. Hence it would not remove invalid transactions recorded in that transaction state, and that fails the test.

            People

            • Assignee:
              anew Andreas Neumann
              Reporter:
              anew Andreas Neumann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development