Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8213

DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.8.0, 2.7.1, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this.

      1. HDFS-8213.001.patch
        10 kB
        Colin P. McCabe
      2. HDFS-8213.002.patch
        17 kB
        Colin P. McCabe

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2131 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2131/)
          HDFS-8213. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2131 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2131/ ) HDFS-8213 . DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #182 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/182/)
          HDFS-8213. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #182 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/182/ ) HDFS-8213 . DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2113/)
          HDFS-8213. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2113/ ) HDFS-8213 . DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #172 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/172/)
          HDFS-8213. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #172 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/172/ ) HDFS-8213 . DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #915 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/915/)
          HDFS-8213. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #915 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/915/ ) HDFS-8213 . DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #181 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/181/)
          HDFS-8213. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #181 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/181/ ) HDFS-8213 . DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          Hide
          cmccabe Colin P. McCabe added a comment -

          committed to 2.7.1. thanks, guys.

          Show
          cmccabe Colin P. McCabe added a comment - committed to 2.7.1. thanks, guys.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7714 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7714/)
          HDFS-8213. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7714 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7714/ ) HDFS-8213 . DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (cmccabe) (cmccabe: rev b82567d45507c50d2f28eff4bbdf3b1a69d4bf1b) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
          Hide
          cmccabe Colin P. McCabe added a comment -

          TestFileTruncate warning is unrelated. checkstyle continues to be busted. committing...

          Show
          cmccabe Colin P. McCabe added a comment - TestFileTruncate warning is unrelated. checkstyle continues to be busted. committing...
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 39s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 4 new or modified test files.
          +1 javac 7m 26s There were no new javac warning messages.
          +1 javadoc 9m 38s There were no new javadoc warning messages.
          +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 47s The applied patch generated 2 new checkstyle issues (total was 17, now 16).
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 37s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 4m 42s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 common tests 23m 32s Tests passed in hadoop-common.
          -1 hdfs tests 163m 48s Tests failed in hadoop-hdfs.
              228m 31s  



          Reason Tests
          Failed unit tests hadoop.hdfs.server.namenode.TestFileTruncate



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12728982/HDFS-8213.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 98a6176
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10499/artifact/patchprocess/diffcheckstylehadoop-common.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10499/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10499/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10499/testReport/
          Java 1.7.0_55
          uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10499/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 39s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 4 new or modified test files. +1 javac 7m 26s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 47s The applied patch generated 2 new checkstyle issues (total was 17, now 16). +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 37s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 4m 42s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 common tests 23m 32s Tests passed in hadoop-common. -1 hdfs tests 163m 48s Tests failed in hadoop-hdfs.     228m 31s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.TestFileTruncate Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12728982/HDFS-8213.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 98a6176 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10499/artifact/patchprocess/diffcheckstylehadoop-common.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10499/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10499/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10499/testReport/ Java 1.7.0_55 uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10499/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          findbugs warning is bogus. patch doesn't modify org.apache.hadoop.hdfs.DataStreamer$LastException. the rest of the stuff looks bogus as well (a lot of test timeouts on random things that aren't enabling / touching tracing), guess it's time to re-run again

          Show
          cmccabe Colin P. McCabe added a comment - - edited findbugs warning is bogus. patch doesn't modify org.apache.hadoop.hdfs.DataStreamer$LastException. the rest of the stuff looks bogus as well (a lot of test timeouts on random things that aren't enabling / touching tracing), guess it's time to re-run again
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 33s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 4 new or modified test files.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 javac 7m 29s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 5m 21s The applied patch generated 1 additional checkstyle issues.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          -1 findbugs 4m 47s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.
          +1 common tests 23m 3s Tests passed in hadoop-common.
          -1 hdfs tests 227m 12s Tests failed in hadoop-hdfs.
              294m 35s  



          Reason Tests
          FindBugs module:hadoop-hdfs
            Class org.apache.hadoop.hdfs.DataStreamer$LastException is not derived from an Exception, even though it is named as such At DataStreamer.java:from an Exception, even though it is named as such At DataStreamer.java:[lines 177-201]
          Failed unit tests hadoop.hdfs.server.datanode.TestBlockRecovery
            hadoop.hdfs.TestDFSClientRetries
            hadoop.hdfs.TestQuota
            hadoop.cli.TestHDFSCLI
            hadoop.hdfs.TestClose
            hadoop.hdfs.TestCrcCorruption
            hadoop.hdfs.TestMultiThreadedHflush
            hadoop.hdfs.TestFileLengthOnClusterRestart
            hadoop.hdfs.TestDFSOutputStream
            hadoop.hdfs.qjournal.TestNNWithQJM
            hadoop.hdfs.server.namenode.TestDeleteRace
            hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation
          Timed out tests org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache
            org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery
            org.apache.hadoop.hdfs.TestDataTransferProtocol



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12728982/HDFS-8213.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / c55d609
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/checkstyle-result-diff.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10489/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10489/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 33s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 4 new or modified test files. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 javac 7m 29s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 5m 21s The applied patch generated 1 additional checkstyle issues. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. -1 findbugs 4m 47s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 common tests 23m 3s Tests passed in hadoop-common. -1 hdfs tests 227m 12s Tests failed in hadoop-hdfs.     294m 35s   Reason Tests FindBugs module:hadoop-hdfs   Class org.apache.hadoop.hdfs.DataStreamer$LastException is not derived from an Exception, even though it is named as such At DataStreamer.java:from an Exception, even though it is named as such At DataStreamer.java: [lines 177-201] Failed unit tests hadoop.hdfs.server.datanode.TestBlockRecovery   hadoop.hdfs.TestDFSClientRetries   hadoop.hdfs.TestQuota   hadoop.cli.TestHDFSCLI   hadoop.hdfs.TestClose   hadoop.hdfs.TestCrcCorruption   hadoop.hdfs.TestMultiThreadedHflush   hadoop.hdfs.TestFileLengthOnClusterRestart   hadoop.hdfs.TestDFSOutputStream   hadoop.hdfs.qjournal.TestNNWithQJM   hadoop.hdfs.server.namenode.TestDeleteRace   hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation Timed out tests org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache   org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery   org.apache.hadoop.hdfs.TestDataTransferProtocol Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12728982/HDFS-8213.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / c55d609 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/checkstyle-result-diff.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10489/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10489/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10489/console This message was automatically generated.
          Hide
          stack stack added a comment -

          +1 from me.

          Show
          stack stack added a comment - +1 from me.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          Thanks for the update, Colin P. McCabe. I'm +1(non-binding) for 002.

          Let's do the hdfs-default.xml and other docs stuff later since it's not directly related to this

          Yeah. I filed HDFS-8284.

          Show
          iwasakims Masatake Iwasaki added a comment - Thanks for the update, Colin P. McCabe . I'm +1(non-binding) for 002. Let's do the hdfs-default.xml and other docs stuff later since it's not directly related to this Yeah. I filed HDFS-8284 .
          Hide
          hadoopqa Hadoop QA added a comment -

          The patch artifact directory on has been removed!
          This is a fatal error for test-patch.sh. Aborting.
          Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-HDFS-Build/10444/ may provide some hints.

          Show
          hadoopqa Hadoop QA added a comment - The patch artifact directory on has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-HDFS-Build/10444/ may provide some hints.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for the review, Masatake Iwasaki. I attached a patch. Let's do the hdfs-default.xml and other docs stuff later since it's not directly related to this

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for the review, Masatake Iwasaki . I attached a patch. Let's do the hdfs-default.xml and other docs stuff later since it's not directly related to this
          Hide
          cmccabe Colin P. McCabe added a comment -

          In SpanReceiverHost#getInstance, loadSpanReceivers is called even if there is already initialized SRH instance. Is it intentional?

          Hmm. Good point... we don't want to be calling this more than once. Let's have a SpanReceiverHost for each config prefix. That's the easiest thing to do. Long-term, I think we should have a new API that avoids the need for all this boilerplate code in the client...

          We need to fix TraceUtils#wrapHadoopConf which always assumes that prefix is "hadoop.htrace.".

          fixed

          Should we add entry for hdfs.client.htrace.spanreceiver.classes to hdfs-default.xml?

          yeah

          Show
          cmccabe Colin P. McCabe added a comment - In SpanReceiverHost#getInstance, loadSpanReceivers is called even if there is already initialized SRH instance. Is it intentional? Hmm. Good point... we don't want to be calling this more than once. Let's have a SpanReceiverHost for each config prefix. That's the easiest thing to do. Long-term, I think we should have a new API that avoids the need for all this boilerplate code in the client... We need to fix TraceUtils#wrapHadoopConf which always assumes that prefix is "hadoop.htrace.". fixed Should we add entry for hdfs.client.htrace.spanreceiver.classes to hdfs-default.xml? yeah
          Hide
          iwasakims Masatake Iwasaki added a comment -

          I agree to use independent config keys for DFSClient in order to make end-to-end tracing from HBase/Accumulo to HDFS work.

          Few comments for 001.

          In SpanReceiverHost#getInstance, loadSpanReceivers is called even if there is already initialized SRH instance. Is it intentional?

              synchronized (SingletonHolder.INSTANCE.lock) {
                if (SingletonHolder.INSTANCE.host == null) {
                  SingletonHolder.INSTANCE.host = new SpanReceiverHost();
                }
                SingletonHolder.INSTANCE.host.loadSpanReceivers(conf, configPrefix);
                ShutdownHookManager.get().addShutdownHook(new Runnable() {
                    public void run() {
                      SingletonHolder.INSTANCE.host.closeReceivers();
                    }
                  }, 0);
                return SingletonHolder.INSTANCE.host;
          


          We need to fix TraceUtils#wrapHadoopConf which always assumes that prefix is "hadoop.htrace.".

          public class TraceUtils {
            public static final String HTRACE_CONF_PREFIX = "hadoop.htrace.";
          


          Should we add entry for hdfs.client.htrace.spanreceiver.classes to hdfs-default.xml?

          Show
          iwasakims Masatake Iwasaki added a comment - I agree to use independent config keys for DFSClient in order to make end-to-end tracing from HBase/Accumulo to HDFS work. Few comments for 001. — In SpanReceiverHost#getInstance , loadSpanReceivers is called even if there is already initialized SRH instance. Is it intentional? synchronized (SingletonHolder.INSTANCE.lock) { if (SingletonHolder.INSTANCE.host == null ) { SingletonHolder.INSTANCE.host = new SpanReceiverHost(); } SingletonHolder.INSTANCE.host.loadSpanReceivers(conf, configPrefix); ShutdownHookManager.get().addShutdownHook( new Runnable () { public void run() { SingletonHolder.INSTANCE.host.closeReceivers(); } }, 0); return SingletonHolder.INSTANCE.host; — We need to fix TraceUtils#wrapHadoopConf which always assumes that prefix is "hadoop.htrace.". public class TraceUtils { public static final String HTRACE_CONF_PREFIX = "hadoop.htrace." ; — Should we add entry for hdfs.client.htrace.spanreceiver.classes to hdfs-default.xml?
          Hide
          cmccabe Colin P. McCabe added a comment -

          What that doesn't clarify to me is how I would connect the dots of spans initiated within the HDFSClient back to actions takes by said app.

          It depends on what we're trying to do. For example, we may be getting reports that the cluster is slow. In this case, seeing that HDFS / HBase requests complete quickly allows us to focus on other systems in the stack.

          Ultimately, the best thing will always be to have tracing in every app. But it will take a while to get there and having the ability to get useful results out of incremental steps is really useful.

          Show
          cmccabe Colin P. McCabe added a comment - What that doesn't clarify to me is how I would connect the dots of spans initiated within the HDFSClient back to actions takes by said app. It depends on what we're trying to do. For example, we may be getting reports that the cluster is slow. In this case, seeing that HDFS / HBase requests complete quickly allows us to focus on other systems in the stack. Ultimately, the best thing will always be to have tracing in every app. But it will take a while to get there and having the ability to get useful results out of incremental steps is really useful.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          Hit post too soon...

          What that doesn't clarify to me is how I would connect the dots of spans initiated within the HDFSClient back to actions takes by said app.

          Show
          ndimiduk Nick Dimiduk added a comment - Hit post too soon... What that doesn't clarify to me is how I would connect the dots of spans initiated within the HDFSClient back to actions takes by said app.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          Another very important use case of tracing is "I have proprietary app X that talks to HDFS, and it's slow. How come?"

          Bingo! That's the perspective I was missing. I hadn't considered a proprietary app market around these tools; being a consumer of some app running on my cluster and wanting visibility into it w/o the app supporting tracing explicitly. Thanks Colin P. McCabe! Please socialize this idea more widely

          Show
          ndimiduk Nick Dimiduk added a comment - Another very important use case of tracing is "I have proprietary app X that talks to HDFS, and it's slow. How come?" Bingo! That's the perspective I was missing. I hadn't considered a proprietary app market around these tools; being a consumer of some app running on my cluster and wanting visibility into it w/o the app supporting tracing explicitly. Thanks Colin P. McCabe ! Please socialize this idea more widely
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for that perspective, Nick Dimiduk.

          I actually don't see any conflict between allowing the client to trace itself, and allowing the application to trace itself. We should be able to support both use-cases. The people who don't want to have the client initiate tracing can simply not set hdfs.client.htrace.spanreceiver.classes and hdfs.client.trace.sampler.

          One very important use-case for HTrace is "how can HBase figure out what HDFS is doing." For this use-case, of course, we don't need the client to initiate tracing... HBase can simply change its code to have the relevant calls to HTrace, and then that will get picked up by DFSClient, DataNode, NN, etc. I think this is the use-case you guys have been focusing on, and understandably so. But this is only one use-case of many. Another very important use case of tracing is "I have proprietary app X that talks to HDFS, and it's slow. How come?" For that use-case, we need to be able to have the DFSClient initiate the tracing, since we don't have the source code for the proprietary app (or if we do, modifying it and redeploying it may require a lengthy admin process.)

          Should HBase and Accumulo clients be providing the same?

          I believe they should. It would be nice to be able to figure out why HBase is slow for some arbitrary workload, without hacking the client. I would like to be able to give a talk about profiling HBase that doesn't start with "first, modify your source code in ways X, Y, and Z"... it's much nicer to tell people to set a config option. Otherwise I feel like I'm telling people to write a mapreduce job in erlang... and you know what that really means I'm telling them This is especially true for non-devs.

          I think we could also improve our API to make it less likely (or maybe even impossible) for client and server tracing configs to conflict so much. I have some ideas for how to do that which I'll take a look at in a follow-on jira

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for that perspective, Nick Dimiduk . I actually don't see any conflict between allowing the client to trace itself, and allowing the application to trace itself. We should be able to support both use-cases. The people who don't want to have the client initiate tracing can simply not set hdfs.client.htrace.spanreceiver.classes and hdfs.client.trace.sampler . One very important use-case for HTrace is "how can HBase figure out what HDFS is doing." For this use-case, of course, we don't need the client to initiate tracing... HBase can simply change its code to have the relevant calls to HTrace, and then that will get picked up by DFSClient, DataNode, NN, etc. I think this is the use-case you guys have been focusing on, and understandably so. But this is only one use-case of many. Another very important use case of tracing is "I have proprietary app X that talks to HDFS, and it's slow. How come?" For that use-case, we need to be able to have the DFSClient initiate the tracing, since we don't have the source code for the proprietary app (or if we do, modifying it and redeploying it may require a lengthy admin process.) Should HBase and Accumulo clients be providing the same? I believe they should. It would be nice to be able to figure out why HBase is slow for some arbitrary workload, without hacking the client. I would like to be able to give a talk about profiling HBase that doesn't start with "first, modify your source code in ways X, Y, and Z"... it's much nicer to tell people to set a config option. Otherwise I feel like I'm telling people to write a mapreduce job in erlang... and you know what that really means I'm telling them This is especially true for non-devs. I think we could also improve our API to make it less likely (or maybe even impossible) for client and server tracing configs to conflict so much. I have some ideas for how to do that which I'll take a look at in a follow-on jira
          Hide
          ndimiduk Nick Dimiduk added a comment -

          Nick Dimiduk, what if I want to trace an HBase PUT all the way through the system? You're saying that the HBase client can't activate tracing on its own, so I have to make code changes to the process doing the PUT (i.e. the user of the HBase client) in order to get that info? Seems like a limitation.

          I think your characterization is accurate. HBase client is a library embedded in applications. The client doesn't enable tracing, the application does. HBase client faithfully extends the calling context through the system by enabling tracing on requests for which the application has initialized a trace request. It delegates responsibility for receiver registration to that calling context as well. For applications that exist before tracing, yes, this implies code changes to add support for the new feature.

          For instance, I have a main program and provide an option for enabling tracing on some percentage of requests. This context is set above the hbase client, which simply honors the context in which it is run. I imaging FsShell would work this way, enabling a trace for a specific invocation of a command. Phoenix's integration works this way too, with a session flag for enabling or disabling tracing managed from the connection or user session. From screen shots I've seen, I believe Cassandra's tracing feature works this way as well, though I don't know if it's using HTrace.

          In a way, the application that embeds the hbase client must also participate in the cluster tracing configuration. It must configure its span receiver to send spends to the same store as the rest of the cluster for the full context to be available.

          hadoop.htrace.span.receiver.classes would be set in the NN and DN configuration files, but not in the ones targetted towards the DFSClients. Does Ambari do something similar?

          Last I looked, Ambari appeared to drop an hdfs-site.xml file into /etc/hbase/conf, which is identical to the one in /etc/hadoop/conf. I don't know if this is different from the configs used by the daemon processes.

          There are hundreds or maybe even thousands of programs that use the HDFS client. It's not practical to modify them all to run Trace#addSpanReceiver. In some cases the programs that use HDFS are even proprietary or customer programs where we don't have access to the source code.

          I'm not sure where the "final responsibility" for instantiating the snap receiver is supposed to lie; I always thought it was with process. HDFSClient seems like an arbitrary place. Should HBase and Accumulo clients be providing the same? Phoenix? My perspective is that there's a singleton list of instances for the process, so it's the responsibility of the main(), not one of the libraries it's imported. So yes, I am suggesting that it is the responsibility of those hundreds or thousands of programs who have all written a main method to setup span receivers for their process in order to take advantage of tracing. The approach you describe hadn't occurred to me until just now.

          Show
          ndimiduk Nick Dimiduk added a comment - Nick Dimiduk, what if I want to trace an HBase PUT all the way through the system? You're saying that the HBase client can't activate tracing on its own, so I have to make code changes to the process doing the PUT (i.e. the user of the HBase client) in order to get that info? Seems like a limitation. I think your characterization is accurate. HBase client is a library embedded in applications. The client doesn't enable tracing, the application does. HBase client faithfully extends the calling context through the system by enabling tracing on requests for which the application has initialized a trace request. It delegates responsibility for receiver registration to that calling context as well. For applications that exist before tracing, yes, this implies code changes to add support for the new feature. For instance, I have a main program and provide an option for enabling tracing on some percentage of requests. This context is set above the hbase client, which simply honors the context in which it is run. I imaging FsShell would work this way, enabling a trace for a specific invocation of a command. Phoenix's integration works this way too, with a session flag for enabling or disabling tracing managed from the connection or user session. From screen shots I've seen, I believe Cassandra's tracing feature works this way as well, though I don't know if it's using HTrace. In a way, the application that embeds the hbase client must also participate in the cluster tracing configuration. It must configure its span receiver to send spends to the same store as the rest of the cluster for the full context to be available. hadoop.htrace.span.receiver.classes would be set in the NN and DN configuration files, but not in the ones targetted towards the DFSClients. Does Ambari do something similar? Last I looked, Ambari appeared to drop an hdfs-site.xml file into /etc/hbase/conf, which is identical to the one in /etc/hadoop/conf. I don't know if this is different from the configs used by the daemon processes. There are hundreds or maybe even thousands of programs that use the HDFS client. It's not practical to modify them all to run Trace#addSpanReceiver. In some cases the programs that use HDFS are even proprietary or customer programs where we don't have access to the source code. I'm not sure where the "final responsibility" for instantiating the snap receiver is supposed to lie; I always thought it was with process. HDFSClient seems like an arbitrary place. Should HBase and Accumulo clients be providing the same? Phoenix? My perspective is that there's a singleton list of instances for the process, so it's the responsibility of the main(), not one of the libraries it's imported. So yes, I am suggesting that it is the responsibility of those hundreds or thousands of programs who have all written a main method to setup span receivers for their process in order to take advantage of tracing. The approach you describe hadn't occurred to me until just now.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 30s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 3 new or modified test files.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 javac 7m 25s There were no new javac warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 7m 41s The applied patch generated 1 additional checkstyle issues.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 4m 44s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 common tests 22m 50s Tests passed in hadoop-common.
          +1 hdfs tests 167m 16s Tests passed in hadoop-hdfs.
              236m 34s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12727720/HDFS-8213.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / ef4e996
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10361/artifact/patchprocess/checkstyle-result-diff.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10361/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10361/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10361/testReport/
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10361/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 30s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 3 new or modified test files. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 javac 7m 25s There were no new javac warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 7m 41s The applied patch generated 1 additional checkstyle issues. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 4m 44s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 common tests 22m 50s Tests passed in hadoop-common. +1 hdfs tests 167m 16s Tests passed in hadoop-hdfs.     236m 34s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12727720/HDFS-8213.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / ef4e996 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10361/artifact/patchprocess/checkstyle-result-diff.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10361/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10361/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10361/testReport/ Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10361/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          can you people suggest configuration for DFSClient..?

          I'm thinking hdfs.client.htrace.spanreceiver.classes. It's not completely trivial because I have to change our SpanReceiverHost thing, but shouldn't be too bad... let me see if I can post the patch

          Show
          cmccabe Colin P. McCabe added a comment - can you people suggest configuration for DFSClient..? I'm thinking hdfs.client.htrace.spanreceiver.classes . It's not completely trivial because I have to change our SpanReceiverHost thing, but shouldn't be too bad... let me see if I can post the patch
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Thanks Billie Rinaldi for raising issue and Colin P. McCabe for your inputs ..

          can you people suggest configuration for DFSClient..?

          Show
          brahmareddy Brahma Reddy Battula added a comment - Thanks Billie Rinaldi for raising issue and Colin P. McCabe for your inputs .. can you people suggest configuration for DFSClient..?
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          I agree, having DFSClient use a separate configuration key seems like it would satisfy all the issues we've discussed. Thanks for coming up with a solution that works for both of us, Colin P. McCabe.

          Show
          billie.rinaldi Billie Rinaldi added a comment - I agree, having DFSClient use a separate configuration key seems like it would satisfy all the issues we've discussed. Thanks for coming up with a solution that works for both of us, Colin P. McCabe .
          Hide
          cmccabe Colin P. McCabe added a comment -

          Yes, clients need tracing, and when they do they should enable it themselves. FsShell should enable tracing when it wants to use it, instead of doing that in DFSClient.

          There are hundreds or maybe even thousands of programs that use the HDFS client. It's not practical to modify them all to run Trace#addSpanReceiver. In some cases the programs that use HDFS are even proprietary or customer programs where we don't have access to the source code.

          I have some ideas for how to make this all work better with a better interface in Tracer. We might need an incompatible interface change to do it, though. For now, let's just change the config key for DFSClient... that should fix the problem for Accumulo.

          Show
          cmccabe Colin P. McCabe added a comment - Yes, clients need tracing, and when they do they should enable it themselves. FsShell should enable tracing when it wants to use it, instead of doing that in DFSClient. There are hundreds or maybe even thousands of programs that use the HDFS client. It's not practical to modify them all to run Trace#addSpanReceiver . In some cases the programs that use HDFS are even proprietary or customer programs where we don't have access to the source code. I have some ideas for how to make this all work better with a better interface in Tracer . We might need an incompatible interface change to do it, though. For now, let's just change the config key for DFSClient... that should fix the problem for Accumulo.
          Hide
          cmccabe Colin P. McCabe added a comment -

          I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well.

          Nick Dimiduk, what if I want to trace an HBase PUT all the way through the system? You're saying that the HBase client can't activate tracing on its own, so I have to make code changes to the process doing the PUT (i.e. the user of the HBase client) in order to get that info? Seems like a limitation.

          It's also worth pointing out that adding a SpanReceiverHost to the DFSClient is not really a new change... it goes back to HDFS-7055, last October. So it's been in there at least 6 months. Of course we can revisit it if that makes sense, but it's not really "new" except in the sense that it took a very long time to do another Hadoop release with that in it. (We really should start being better with releases...)

          Thinking about this a little more, another possible resolution here is to change the configuration keys which the DFSClient looks for so that it's different than the ones which the NameNode and DataNode look for. Right now hadoop.htrace.spanreceiver.classes will activate span receivers in both the NN and the DFSClient. But the DFSClient could instead look for hdfs.client.htrace.spanreceiver.classes. Then Billie Rinaldi could use the same configuration file for everything, and the dfsclient would never create its own span receivers or samplers. And I could continue to trace the dfsclient without modifying daemon code. Seems like a good resolution.

          Show
          cmccabe Colin P. McCabe added a comment - I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well. Nick Dimiduk , what if I want to trace an HBase PUT all the way through the system? You're saying that the HBase client can't activate tracing on its own, so I have to make code changes to the process doing the PUT (i.e. the user of the HBase client) in order to get that info? Seems like a limitation. It's also worth pointing out that adding a SpanReceiverHost to the DFSClient is not really a new change... it goes back to HDFS-7055 , last October. So it's been in there at least 6 months. Of course we can revisit it if that makes sense, but it's not really "new" except in the sense that it took a very long time to do another Hadoop release with that in it. (We really should start being better with releases...) Thinking about this a little more, another possible resolution here is to change the configuration keys which the DFSClient looks for so that it's different than the ones which the NameNode and DataNode look for. Right now hadoop.htrace.spanreceiver.classes will activate span receivers in both the NN and the DFSClient. But the DFSClient could instead look for hdfs.client.htrace.spanreceiver.classes . Then Billie Rinaldi could use the same configuration file for everything, and the dfsclient would never create its own span receivers or samplers. And I could continue to trace the dfsclient without modifying daemon code. Seems like a good resolution.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          Yes, clients need tracing, and when they do they should enable it themselves. FsShell should enable tracing when it wants to use it, instead of doing that in DFSClient.

          A solution in 2.7.1 would be sufficient. I have concerns about bumping the htrace dependency version as htrace 3.2 has many API incompatibilities with 3.1. I've begun to compile them on ACCUMULO-3741.

          Show
          billie.rinaldi Billie Rinaldi added a comment - Yes, clients need tracing, and when they do they should enable it themselves. FsShell should enable tracing when it wants to use it, instead of doing that in DFSClient. A solution in 2.7.1 would be sufficient. I have concerns about bumping the htrace dependency version as htrace 3.2 has many API incompatibilities with 3.1. I've begun to compile them on ACCUMULO-3741 .
          Hide
          cmccabe Colin P. McCabe added a comment -

          The hadoop.htrace.span.receiver.classes is not set in Accumulo configuration files, but it is set in Hadoop configuration files. Accumulo uses Hadoop configuration files to connect to HDFS, thus its uses of DFSClient will have Hadoop's hadoop.htrace.span.receiver.classes. HBase does something similar, I believe.

          The way Cloudera Manager manages configuration files is that it creates separate config files for each daemon. So the NameNode reads its own set of config files, the DataNode reads a separate set, Hive reads another set, Flume reads still another set, etc. etc. So hadoop.htrace.span.receiver.classes would be set in the NN and DN configuration files, but not in the ones targetted towards the DFSClients. Does Ambari do something similar? It seems like using the same set of configuration files for everything would be very limiting, if you wanted to do something like turn on short circuit for some clients but not for others, etc.

          I know from a developer perspective it's frustrating to not be able to use the same config files for every daemon (I like to do that myself) but it's not broken, just inconvenient.

          No. The way it works (did work, until this change was introduced in DFSClient) is that server processes instantiate SpanReceiverHost. If an app wants tracing, it also has to instantiate SpanReceiverHost. The Accumulo client does not instantiate SPH itself, as DFSClient should not.

          It's not true that only server processes need tracing. Clients also need tracing. For example, one test I do a lot is to run FsShell with tracing turned on. This would not be possible if only servers had tracing. The point that I was making with my example is that the Accumulo client itself probably should have tracing too, and this would potentially conflict with another server using the Accumulo client.

          The change in DFSClient changes how apps are supposed to use tracing. It seems like this would be mitigated by deduping SpanReceivers in htrace, but if we go that route I would like the DFSClient change to be reverted until HDFS moves to a version of htrace with deduping. Otherwise, Accumulo and HBase will have to leave HDFS tracing disabled, or change how they're configuring HDFS, if they wish to avoid double delivery of spans.

          We're doing a new release of HTrace soon... like this week or the next. If we can get the deduping into the 3.2 release, we can bump the version in Hadoop 2.7.1. We can't change what's in Hadoop 2.7.0, that release is done.

          Thanks again for trying this stuff out. I'm going to work on a deduping patch for HTrace, would appreciate a review.

          Show
          cmccabe Colin P. McCabe added a comment - The hadoop.htrace.span.receiver.classes is not set in Accumulo configuration files, but it is set in Hadoop configuration files. Accumulo uses Hadoop configuration files to connect to HDFS, thus its uses of DFSClient will have Hadoop's hadoop.htrace.span.receiver.classes. HBase does something similar, I believe. The way Cloudera Manager manages configuration files is that it creates separate config files for each daemon. So the NameNode reads its own set of config files, the DataNode reads a separate set, Hive reads another set, Flume reads still another set, etc. etc. So hadoop.htrace.span.receiver.classes would be set in the NN and DN configuration files, but not in the ones targetted towards the DFSClients. Does Ambari do something similar? It seems like using the same set of configuration files for everything would be very limiting, if you wanted to do something like turn on short circuit for some clients but not for others, etc. I know from a developer perspective it's frustrating to not be able to use the same config files for every daemon (I like to do that myself) but it's not broken, just inconvenient. No. The way it works (did work, until this change was introduced in DFSClient) is that server processes instantiate SpanReceiverHost. If an app wants tracing, it also has to instantiate SpanReceiverHost. The Accumulo client does not instantiate SPH itself, as DFSClient should not. It's not true that only server processes need tracing. Clients also need tracing. For example, one test I do a lot is to run FsShell with tracing turned on. This would not be possible if only servers had tracing. The point that I was making with my example is that the Accumulo client itself probably should have tracing too, and this would potentially conflict with another server using the Accumulo client. The change in DFSClient changes how apps are supposed to use tracing. It seems like this would be mitigated by deduping SpanReceivers in htrace, but if we go that route I would like the DFSClient change to be reverted until HDFS moves to a version of htrace with deduping. Otherwise, Accumulo and HBase will have to leave HDFS tracing disabled, or change how they're configuring HDFS, if they wish to avoid double delivery of spans. We're doing a new release of HTrace soon... like this week or the next. If we can get the deduping into the 3.2 release, we can bump the version in Hadoop 2.7.1. We can't change what's in Hadoop 2.7.0, that release is done. Thanks again for trying this stuff out. I'm going to work on a deduping patch for HTrace, would appreciate a review.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          The hadoop.htrace.span.receiver.classes is not set in Accumulo configuration files, but it is set in Hadoop configuration files. Accumulo uses Hadoop configuration files to connect to HDFS, thus its uses of DFSClient will have Hadoop's hadoop.htrace.span.receiver.classes. HBase does something similar, I believe.

          Plus, it just kicks the problem up to a higher level. If my FooProcess wants to use both HTrace and Accumulo, FooProcess could easily make the same argument that "Accumulo should not instantiate SpanReceiverHost" since FooProcess is already doing that. And since FooProcess uses the accumulo client, it would conflict with whatever accumulo was configuring, if the same config file was used for everything.

          No. The way it works (did work, until this change was introduced in DFSClient) is that server processes instantiate SpanReceiverHost. If an app wants tracing, it also has to instantiate SpanReceiverHost. The Accumulo client does not instantiate SPH itself, as DFSClient should not.

          One thing we could do to make this a little less painful is to deduplicate span receivers inside the library. So if both DFSClient and Accumlo requested an HTracedSpanReceiver, we could simply create one instance of that. This would allow us to use the same config file for everything.

          The change in DFSClient changes how apps are supposed to use tracing. It seems like this would be mitigated by deduping SpanReceivers in htrace, but if we go that route I would like the DFSClient change to be reverted until HDFS moves to a version of htrace with deduping. Otherwise, Accumulo and HBase will have to leave HDFS tracing disabled, or change how they're configuring HDFS, if they wish to avoid double delivery of spans.

          Show
          billie.rinaldi Billie Rinaldi added a comment - The hadoop.htrace.span.receiver.classes is not set in Accumulo configuration files, but it is set in Hadoop configuration files. Accumulo uses Hadoop configuration files to connect to HDFS, thus its uses of DFSClient will have Hadoop's hadoop.htrace.span.receiver.classes. HBase does something similar, I believe. Plus, it just kicks the problem up to a higher level. If my FooProcess wants to use both HTrace and Accumulo, FooProcess could easily make the same argument that "Accumulo should not instantiate SpanReceiverHost" since FooProcess is already doing that. And since FooProcess uses the accumulo client, it would conflict with whatever accumulo was configuring, if the same config file was used for everything. No. The way it works (did work, until this change was introduced in DFSClient) is that server processes instantiate SpanReceiverHost. If an app wants tracing, it also has to instantiate SpanReceiverHost. The Accumulo client does not instantiate SPH itself, as DFSClient should not. One thing we could do to make this a little less painful is to deduplicate span receivers inside the library. So if both DFSClient and Accumlo requested an HTracedSpanReceiver, we could simply create one instance of that. This would allow us to use the same config file for everything. The change in DFSClient changes how apps are supposed to use tracing. It seems like this would be mitigated by deduping SpanReceivers in htrace, but if we go that route I would like the DFSClient change to be reverted until HDFS moves to a version of htrace with deduping. Otherwise, Accumulo and HBase will have to leave HDFS tracing disabled, or change how they're configuring HDFS, if they wish to avoid double delivery of spans.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks again for kicking the tires on htrace, Billie Rinaldi. Let me see if I can get to the bottom of this.

          As documented, each process must configure its own span receivers if it wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode will not do any tracing.

          You are right that you need to set hadoop.htrace.span.receiver.classes in the NameNode and DataNode configuration. However, you need to avoid setting it in the Accumulo configuration... instead, use whatever configuration Accumulo uses to set this value. This means that you can't use the same config file for the NN and DN as for the DFSClient, currently.

          If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance only when the sampler is set to something other than NeverSampler.

          Keep in mind that hadoop.htrace.sampler is a completely different configuration key than hadoop.htrace.span.receiver.classes. If you are sampling at the level of Accumulo operations, I would not recommend setting hadoop.htrace.sampler, in any config file on the cluster. You want all of the sampling to happen inside accumulo.

          I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well.

          HBase is exactly the same. In the case of HBase, you do not want to set hadoop.htrace.span.receiver.classes in the HBase config files. Instead, you would set hbase.htrace.span.receiver.classes. Then HBase would create a span receiver, and DFSClient would not.

          It seems like there is a hidden assumption here that you want to use the same config file for everything. But we really don't support that right now.

          Getting rid of the SpanReceiverHost in DFSClient is not an option since some people want to just trace HDFS without tracing any other system. Plus, it just kicks the problem up to a higher level. If my FooProcess wants to use both HTrace and Accumulo, FooProcess could easily make the same argument that "Accumulo should not instantiate SpanReceiverHost" since FooProcess is already doing that. And since FooProcess uses the accumulo client, it would conflict with whatever accumulo was configuring, if the same config file was used for everything.

          One thing we could do to make this a little less painful is to deduplicate span receivers inside the library. So if both DFSClient and Accumlo requested an HTracedSpanReceiver, we could simply create one instance of that. This would allow us to use the same config file for everything.

          As a side note, Billie Rinaldi, can you explain how you configure which sampler and span receiver accumulo uses? In HBase we set it to hbase.htrace.span.receiver.classes, etc. I would recommend something like accumulo.htrace.span.receiver.classes for consistency. This also allows you to sue the same config file for everything since it doesn't conflict with the keys which Hadoop uses to set these values. That is the reason why we set up the "hbase.htrace" "namespace" as separate from the "hadoop.htrace" "namespace" if you see what I'm saying.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks again for kicking the tires on htrace, Billie Rinaldi . Let me see if I can get to the bottom of this. As documented, each process must configure its own span receivers if it wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode will not do any tracing. You are right that you need to set hadoop.htrace.span.receiver.classes in the NameNode and DataNode configuration. However, you need to avoid setting it in the Accumulo configuration... instead, use whatever configuration Accumulo uses to set this value. This means that you can't use the same config file for the NN and DN as for the DFSClient, currently. If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance only when the sampler is set to something other than NeverSampler. Keep in mind that hadoop.htrace.sampler is a completely different configuration key than hadoop.htrace.span.receiver.classes . If you are sampling at the level of Accumulo operations, I would not recommend setting hadoop.htrace.sampler , in any config file on the cluster. You want all of the sampling to happen inside accumulo. I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well. HBase is exactly the same. In the case of HBase, you do not want to set hadoop.htrace.span.receiver.classes in the HBase config files. Instead, you would set hbase.htrace.span.receiver.classes . Then HBase would create a span receiver, and DFSClient would not. It seems like there is a hidden assumption here that you want to use the same config file for everything. But we really don't support that right now. Getting rid of the SpanReceiverHost in DFSClient is not an option since some people want to just trace HDFS without tracing any other system. Plus, it just kicks the problem up to a higher level. If my FooProcess wants to use both HTrace and Accumulo, FooProcess could easily make the same argument that "Accumulo should not instantiate SpanReceiverHost" since FooProcess is already doing that. And since FooProcess uses the accumulo client, it would conflict with whatever accumulo was configuring, if the same config file was used for everything. One thing we could do to make this a little less painful is to deduplicate span receivers inside the library. So if both DFSClient and Accumlo requested an HTracedSpanReceiver, we could simply create one instance of that. This would allow us to use the same config file for everything. As a side note, Billie Rinaldi , can you explain how you configure which sampler and span receiver accumulo uses? In HBase we set it to hbase.htrace.span.receiver.classes , etc. I would recommend something like accumulo.htrace.span.receiver.classes for consistency. This also allows you to sue the same config file for everything since it doesn't conflict with the keys which Hadoop uses to set these values. That is the reason why we set up the "hbase.htrace" "namespace" as separate from the "hadoop.htrace" "namespace" if you see what I'm saying.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well.

          Show
          ndimiduk Nick Dimiduk added a comment - I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance only when the sampler is set to something other than NeverSampler.

          Show
          billie.rinaldi Billie Rinaldi added a comment - If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance only when the sampler is set to something other than NeverSampler.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          As documented, each process must configure its own span receivers if it wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode will not do any tracing.

          Show
          billie.rinaldi Billie Rinaldi added a comment - As documented, each process must configure its own span receivers if it wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode will not do any tracing.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Hi Billie,

          DFSClient needs to instantiate SpanReceiverHost in order to implement tracing, in the case where the process using the DFSClient doesn't configure its own span receivers.

          If you are concerned about multiple span receivers being instantiated, simply set hadoop.htrace.span.receiver.classes to the empty string, and Hadoop won't instantiate any span receivers. That should be its default anyway.

          Show
          cmccabe Colin P. McCabe added a comment - Hi Billie, DFSClient needs to instantiate SpanReceiverHost in order to implement tracing, in the case where the process using the DFSClient doesn't configure its own span receivers. If you are concerned about multiple span receivers being instantiated, simply set hadoop.htrace.span.receiver.classes to the empty string, and Hadoop won't instantiate any span receivers. That should be its default anyway.

            People

            • Assignee:
              cmccabe Colin P. McCabe
              Reporter:
              billie.rinaldi Billie Rinaldi
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development