Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8069

Tracing implementation on DFSInputStream seriously degrades performance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.7.0
    • None
    • hdfs-client
    • None

    Description

      I've been doing some testing of Accumulo with HDFS 2.7.0 and have noticed a serious performance impact when Accumulo registers itself as a SpanReceiver.

      The context of the test which I noticed the impact is that an Accumulo process reads a series of updates from a write-ahead log. This is just reading a series of Writable objects from a file in HDFS. With tracing enabled, I waited for at least 10 minutes and the server still hadn't read a ~300MB file.

      Doing a poor-man's inspection via repeated thread dumps, I always see something like the following:

      "replication task 2" daemon prio=10 tid=0x0000000002842800 nid=0x794d runnable [0x00007f6c7b1ec000]
         java.lang.Thread.State: RUNNABLE
              at java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
              at org.apache.htrace.Tracer.deliver(Tracer.java:80)
              at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177)
              - locked <0x000000077a770730> (a org.apache.htrace.impl.MilliSpan)
              at org.apache.htrace.TraceScope.close(TraceScope.java:78)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898)
              - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
              - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
              at java.io.DataInputStream.readByte(DataInputStream.java:265)
              at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
              at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
              at org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:951)
             ... more accumulo code omitted...
      

      What I'm seeing here is that reading a single byte (in WritableUtils.readVLong) is causing a new Span creation and close (which includes a flush to the SpanReceiver). This results in an extreme amount of spans for DFSInputStream.byteArrayRead just for reading a file from HDFS – over 700k spans for just reading a few hundred MB file.

      Perhaps there's something different we need to do for the SpanReceiver in Accumulo? I'm not entirely sure, but this was rather unexpected.

      cc/ cmccabe

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              elserj Josh Elser
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: