Thanks again for kicking the tires on htrace, Billie Rinaldi. Let me see if I can get to the bottom of this.
As documented, each process must configure its own span receivers if it wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode will not do any tracing.
You are right that you need to set hadoop.htrace.span.receiver.classes in the NameNode and DataNode configuration. However, you need to avoid setting it in the Accumulo configuration... instead, use whatever configuration Accumulo uses to set this value. This means that you can't use the same config file for the NN and DN as for the DFSClient, currently.
If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance only when the sampler is set to something other than NeverSampler.
Keep in mind that hadoop.htrace.sampler is a completely different configuration key than hadoop.htrace.span.receiver.classes. If you are sampling at the level of Accumulo operations, I would not recommend setting hadoop.htrace.sampler, in any config file on the cluster. You want all of the sampling to happen inside accumulo.
I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well.
HBase is exactly the same. In the case of HBase, you do not want to set hadoop.htrace.span.receiver.classes in the HBase config files. Instead, you would set hbase.htrace.span.receiver.classes. Then HBase would create a span receiver, and DFSClient would not.
It seems like there is a hidden assumption here that you want to use the same config file for everything. But we really don't support that right now.
Getting rid of the SpanReceiverHost in DFSClient is not an option since some people want to just trace HDFS without tracing any other system. Plus, it just kicks the problem up to a higher level. If my FooProcess wants to use both HTrace and Accumulo, FooProcess could easily make the same argument that "Accumulo should not instantiate SpanReceiverHost" since FooProcess is already doing that. And since FooProcess uses the accumulo client, it would conflict with whatever accumulo was configuring, if the same config file was used for everything.
One thing we could do to make this a little less painful is to deduplicate span receivers inside the library. So if both DFSClient and Accumlo requested an HTracedSpanReceiver, we could simply create one instance of that. This would allow us to use the same config file for everything.
As a side note, Billie Rinaldi, can you explain how you configure which sampler and span receiver accumulo uses? In HBase we set it to hbase.htrace.span.receiver.classes, etc. I would recommend something like accumulo.htrace.span.receiver.classes for consistency. This also allows you to sue the same config file for everything since it doesn't conflict with the keys which Hadoop uses to set these values. That is the reason why we set up the "hbase.htrace" "namespace" as separate from the "hadoop.htrace" "namespace" if you see what I'm saying.