Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4457

ConditionalWriterIT.testTrace fails a lot



    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.6.6
    • Fix Version/s: None
    • Component/s: test
    • Labels:


      So, I've been trying to test 1.6.6 release candidate, and I keep getting frequent failures of the ConditionalWriterIT.testTrace.

      I've seen two kinds of failures, both of which result in timeouts (because the IT continues to retry until the timeout).

      The first is a very simple case: it can't find any traces. This is probably due to a startup race condition where the client runs before the tracer is up. The spammy output looks like:

      2016-09-14 17:04:26,799 [test.ConditionalWriterIT] INFO : Ignoring trace output as traceCount not greater than zero: 0
      2016-09-14 17:04:27,802 [test.ConditionalWriterIT] INFO : Trace output:Did not find any traces!
      2016-09-14 17:04:27,802 [test.ConditionalWriterIT] INFO : Ignoring trace output as traceCount not greater than zero: 0
      2016-09-14 17:04:28,805 [test.ConditionalWriterIT] INFO : Trace output:Did not find any traces!
      2016-09-14 17:04:28,805 [test.ConditionalWriterIT] INFO : Ignoring trace output as traceCount not greater than zero: 0
      2016-09-14 17:04:29,808 [test.ConditionalWriterIT] INFO : Trace output:Did not find any traces!

      The TraceServer's log file contain no activity after startup (no activity indicated for successful run, either... the tracer probably just doesn't log much, or I don't have the log threshold low enough).

      The second, and more annoying failure is that it appears sometimes, some spans are unrooted.

      The output of a failed run looks like:

      2016-09-13 22:15:06,464 [test.ConditionalWriterIT] INFO : Ignoring trace output as traceCount not greater than zero: -1
      2016-09-13 22:15:07,475 [test.ConditionalWriterIT] INFO : Trace output:Trace started at 2016/09/13 22:15:01.046
      Time  Start  Service@Location       Name
        322+0      testTrace@localhost traceTest
          0+35       testTrace@localhost client:startScan
         27+40       tserver@ startScan
          0+67       testTrace@localhost client:closeScan
          0+68       tserver@ closeScan
          0+68       testTrace@localhost client:startScan
          3+69       tserver@ startScan
          1+70         tserver@ metadata tablets read ahead 7
          0+72       testTrace@localhost client:closeScan
          0+73       tserver@ closeScan
      Warning: the following spans are not rooted!
      98074c50fb57bce3 44a1f6292a6ec858 client:startScan
      803c22e524cfaab4 c0944de077dcfbaa metadata tablets read ahead 2
      c0944de077dcfbaa 44a1f6292a6ec858 startScan
      e87ec35eb1a659ee 44a1f6292a6ec858 client:closeScan
      75e4a9576afc000b 98bfb7e65f27e99 client:startConditionalUpdate
      bd6250a4d3050aba 44a1f6292a6ec858 closeScan
      98bfb7e65f27e99 44a1f6292a6ec858 ConditionalWriterImpl 2
      d83a024f5e4dbae5 98bfb7e65f27e99 startConditionalUpdate
      45472c5b3fc34b0b 98bfb7e65f27e99 client:conditionalUpdate
      42d81ffa77149a94 98bfb7e65f27e99 conditionalUpdate
      2d6c20f144b406cd 42d81ffa77149a94 Check conditions
      715be33f3216fe99 42d81ffa77149a94 apply conditional mutations
      ff11b4df0e8ddd94 715be33f3216fe99 prep
      9acf9f34eeb4c7c1 715be33f3216fe99 wal
      191305dbf1fb40ff 9acf9f34eeb4c7c1 update
      3dd49cdc4b5a2d7d 9acf9f34eeb4c7c1 client:update
      e31c17c89e1b3cdc 191305dbf1fb40ff prep
      84dbf7ebd59be75 191305dbf1fb40ff commit
      198bc516183d8e7f 191305dbf1fb40ff wal
      61b91d25c997c748 715be33f3216fe99 commit

      For comparison, the output of a successful run looks like:

      2016-09-14 17:07:57,125 [test.ConditionalWriterIT] INFO : Ignoring trace output as traceCount not greater than zero: -1
      2016-09-14 17:07:58,134 [test.ConditionalWriterIT] INFO : Trace output:Trace started at 2016/09/14 17:07:54.663
      Time  Start  Service@Location       Name
        428+0      testTrace@localhost traceTest
          3+360      testTrace@localhost client:startScan
          4+365      tserver@ startScan
          0+377      testTrace@localhost client:closeScan
          0+378      tserver@ closeScan
          0+380      testTrace@localhost client:startScan
          5+381      tserver@ startScan
          0+385        tserver@ metadata tablets read ahead 4
          0+388      tserver@ closeScan
          0+388      testTrace@localhost client:closeScan
         39+389      testTrace@localhost ConditionalWriterImpl 1
          1+397        testTrace@localhost client:startConditionalUpdate
          2+402        tserver@ startConditionalUpdate
          2+408        testTrace@localhost client:conditionalUpdate
          8+417        tserver@ conditionalUpdate
          2+419          tserver@ Check conditions
          4+421          tserver@ apply conditional mutations
          1+421            tserver@ prep
          3+422            tserver@ wal 
          0+423              tserver@ client:update
          2+423              tserver@ update
          0+424                tserver@ wal 
          0+424                tserver@ prep
          0+425                tserver@ commit
          0+425            tserver@ commit
      The following is all the nodes!
      c63771e049ce64df 0 traceTest
      346315d5c08dacd c63771e049ce64df client:startScan
      83a6b294967ec2c2 c63771e049ce64df startScan
      c31f792dd3cf0479 c63771e049ce64df client:closeScan
      6847784061285c9f c63771e049ce64df closeScan
      6675da322b33924f c63771e049ce64df client:startScan
      874c41927959277a c63771e049ce64df startScan
      7f26ece0e74cebd3 874c41927959277a metadata tablets read ahead 4
      7e4d4904313a0096 c63771e049ce64df closeScan
      9324832a57fbe3c1 c63771e049ce64df client:closeScan
      e6cccd862b014fc2 c63771e049ce64df ConditionalWriterImpl 1
      e8c2e500c3b87beb e6cccd862b014fc2 client:startConditionalUpdate
      1bbe4c4bc22565d9 e6cccd862b014fc2 startConditionalUpdate
      f13d4a0dfbb322ed e6cccd862b014fc2 client:conditionalUpdate
      2ec39d24415847b1 e6cccd862b014fc2 conditionalUpdate
      2b076238cd26cb8f 2ec39d24415847b1 Check conditions
      c0addb5c7a6b3fca 2ec39d24415847b1 apply conditional mutations
      2dd63e5ea321119d c0addb5c7a6b3fca prep
      d6958323b0cb7f5e c0addb5c7a6b3fca wal 
      616296d843722d0a d6958323b0cb7f5e client:update
      db05c61214e16ce5 d6958323b0cb7f5e update
      9d3417d7b56a21e0 db05c61214e16ce5 prep
      2d66569d22a19871 db05c61214e16ce5 wal 
      21cd8265be1e6c94 c0addb5c7a6b3fca commit
      b381168631ad019c db05c61214e16ce5 commit

      I added code to dump the span IDs and parent IDs for the rooted spans, to more easily compare.

      The biggest thing that stands out for me is that the one with the description ConditionalWriterImpl 2 is not rooted at "testTrace" like the ConditionalWriterImpl 1 is in the successful case.

      As far as I can tell, the failure case is always case 2, and the successful case is always case 1.

      It looks like the problem is caused by the NamingThreadFactory creating wrapping Runnable s with a new TraceRunnable based on the current thread's currentSpan. It's not known which thread is creating new threads within the thread pool's executor. Instead of having NamingThreadFactory wrap when it creates a new executor thread, we need to ensure we wrap the spans before we place them into the thread pool (when we call execute()), based on the originating thread's currentSpan.

      The NamingThreadFactory is used in multiple places, so it may not be an easy fix to address this in all places.




            • Assignee:
              ctubbsii Christopher Tubbs
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: