Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2405

Race condition in MetadataIT.mergeMeta()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • None
    • None
    • 2ef2d88598f5e14f8f96b77fecca66dcd7196448

    Description

      Saw the following error while running ITs.

      mergeMeta(org.apache.accumulo.test.functional.MetadataIT)  Time elapsed: 10.39 sec  <<< FAILURE!
      java.lang.AssertionError
      	at org.junit.Assert.fail(Assert.java:86)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.junit.Assert.assertTrue(Assert.java:52)
      	at org.apache.accumulo.test.functional.MetadataIT.mergeMeta(MetadataIT.java:103)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      

      Looking at the test it merges the metadata table, sleeps for 2 secs, and then scans for delete entries. Looking in the logs I think the AGC deleted the delete entries while the test was sleeping.

      Saw the following in the master log

      2014-02-25 12:06:36,981 [master.EventCoordinator] INFO : Merge state of !0<< set to MERGING
      2014-02-25 12:06:37,047 [master.EventCoordinator] INFO : Merge state of !0<< set to COMPLETE
      2014-02-25 12:06:37,169 [state.MergeStats] INFO : Computing next merge state for !0<< which is presently COMPLETE isDelete : false
      2014-02-25 12:06:37,189 [master.EventCoordinator] INFO : Merge state of !0<< set to NONE
      2014-02-25 12:06:37,212 [tableOps.MasterRepo] INFO : removing merge information No Merge in progress
      2014-02-25 12:06:37,213 [master.EventCoordinator] INFO : Merge state of !0 cleared
      

      Then saw the following in the AGC logs.

      2014-02-25 12:06:39,335 [gc.SimpleGarbageCollector] DEBUG: Deleting file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.SimpleMacIT/1393347971697_26818/accumulo/tables/!0/t-000003u
      2014-02-25 12:06:39,335 [gc.SimpleGarbageCollector] DEBUG: Deleting file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.SimpleMacIT/1393347971697_26818/accumulo/tables/!0/t-000003v
      2014-02-25 12:06:39,335 [fs.TrashPolicyDefault] INFO : Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
      2014-02-25 12:06:39,335 [gc.SimpleGarbageCollector] DEBUG: Deleting file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.SimpleMacIT/1393347971697_26818/accumulo/tables/!0/t-000003w
      2014-02-25 12:06:39,336 [gc.SimpleGarbageCollector] DEBUG: Deleting file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.SimpleMacIT/1393347971697_26818/accumulo/tables/!0/t-000003y
      2014-02-25 12:06:39,336 [gc.SimpleGarbageCollector] DEBUG: Deleting file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.SimpleMacIT/1393347971697_26818/accumulo/tables/!0/table_info
      2014-02-25 12:06:39,336 [gc.SimpleGarbageCollector] DEBUG: Deleting file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.SimpleMacIT/1393347971697_26818/accumulo/tables/!0/t-000003x
        .
        .
        .
      2014-02-25 12:06:39,343 [impl.RootTabletLocator] TRACE: tid=12 oid=91  Found root tablet at host1:40158|1446a02ccfe0002 in 0.000 secs
      2014-02-25 12:06:39,344 [impl.TabletServerBatchWriter] TRACE: Started sending 15 mutations to 1 tablet servers
      2014-02-25 12:06:39,344 [impl.ThriftTransportPool] TRACE: Using existing connection to host1:40158
      2014-02-25 12:06:39,366 [impl.ThriftTransportPool] TRACE: Returned connection host1:40158 (120000) ioCount : 586
      2014-02-25 12:06:39,367 [impl.TabletServerBatchWriter] TRACE: sent 15 mutations to host1:40158 in 0.02 secs (681.82 mutations/sec) with 0 failures
      

      Attachments

        Issue Links

          Activity

            People

              ecn Eric C. Newton
              kturner Keith Turner
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: