Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3853

Contention around ConcurrentLinkedQueue.size() in AsyncSpanReceiver

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.7.0
    • 1.7.1, 1.8.0
    • trace
    • None

    Description

      Trying to debug slowness in the replication merkle test, I saw a lot of stack traces sitting here:

      "replication task 1" #297 daemon prio=5 os_prio=0 tid=0x0000000004624000 nid=0x6b46 runnable [0x00007ff8867b6000]
         java.lang.Thread.State: RUNNABLE
              at java.util.concurrent.ConcurrentLinkedQueue.size(ConcurrentLinkedQueue.java:450)
              at org.apache.accumulo.tracer.AsyncSpanReceiver.receiveSpan(AsyncSpanReceiver.java:171)
              at org.apache.htrace.Tracer.deliver(Tracer.java:81)
              at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177)
              - locked <0x00000000a7fe3560> (a org.apache.htrace.impl.MilliSpan)
              at org.apache.htrace.TraceScope.close(TraceScope.java:78)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898)
              - locked <0x00000000b0086118> (a org.apache.hadoop.hdfs.DFSInputStream)
              at java.io.DataInputStream.readFully(DataInputStream.java:195)
              at java.io.DataInputStream.readFully(DataInputStream.java:169)
              at org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:965)
              at org.apache.accumulo.server.data.ServerMutation.readFields(ServerMutation.java:52)
              at org.apache.accumulo.tserver.logger.LogFileValue.readFields(LogFileValue.java:45)
              at org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.getWalEdits(AccumuloReplicaSystem.java:702)
              at org.apache.accumulo.tserver.replication.AccumuloReplicaSystem$WalClientExecReturn.execute(AccumuloReplicaSystem.java:531)
              at org.apache.accumulo.tserver.replication.AccumuloReplicaSystem$WalClientExecReturn.execute(AccumuloReplicaSystem.java:506)
              at org.apache.accumulo.core.client.impl.ReplicationClient.executeServicerWithReturn(ReplicationClient.java:189)
              at org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.replicateLogs(AccumuloReplicaSystem.java:429)
              at org.apache.accumulo.tserver.replication.AccumuloReplicaSystem._replicate(AccumuloReplicaSystem.java:291)
              at org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.replicate(AccumuloReplicaSystem.java:212)
              at org.apache.accumulo.tserver.replication.ReplicationProcessor.process(ReplicationProcessor.java:134)
              at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:107)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
              at java.lang.Thread.run(Thread.java:745)
      
         Locked ownable synchronizers:
              - <0x00000000b00dbf48> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      

      Seems like this might still be related to HDFS-8069, but one odd thing is that we were sitting in the size() method.

      For those who don't remember, size() is a big o N operation on ConcurrentLinkedQueue. Calling the size() method every time we receive a new span is a little nasty.

      Attachments

        Activity

          People

            elserj Josh Elser
            elserj Josh Elser
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m