Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: metrics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New server web page .../metrics allows convenient access to metrics data via JSON and text.

      Description

      Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.

      1. HADOOP-5469.patch
        14 kB
        Philip Zeyliger
      2. HADOOP-5469.patch
        24 kB
        Philip Zeyliger

        Activity

        Philip Zeyliger created issue -
        Philip Zeyliger made changes -
        Field Original Value New Value
        Attachment HADOOP-5469.patch [ 12402000 ]
        Philip Zeyliger made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Philip Zeyliger made changes -
        Description I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to "/metrics" on any Hadoop daemon that has an HttpServer. My motivation is pretty simple--if you're running on a lot of machines, tracking down the relevant metrics files is pretty time-consuming; this would be a useful debugging utility. I'd also like the output to be parseable, so I could write a quick web app to query the metrics dynamically.

        This is similar in spirit, but different, from just using JMX. (See also HADOOP-4756.) JMX requires a client, and, more annoyingly, JMX requires setting up authentication. If you just disable authentication, someone can do Bad Things, and if you enable it, you have to worry about yet another password. It's also more complete--JMX require separate instrumentation, so, for example, the JobTracker's metrics aren't exposed via JMX.

        To start the discussion going, I've attached a patch. I had to add a method to ContextFactory to get all the active MetrixContexts, implement a do-little MetricsContext that simply inherits from AbstractMetricsContext, add a method to MetricsContext to get all the records, expose copy methods for the maps in OutputRecord, and implemented an easy servlet. I ended up removing some
        common code from all MetricsContexts, for setting the period; I'm open to taking that out if it muddies the patch significantly.

        I'd love to hear your suggestions. There's a bug in the JSON representation, and there's some gross type-handling.

        The patch is missing tests. I wanted to post to gather feedback before I got too far, but tests are forthcoming.

        Here's a sample output for a job tracker, while it was running a "pi" job:

        {noformat}
        jvm
          metrics
            {hostName=doorstop.local, processName=JobTracker, sessionId=}
              gcCount=22
              gcTimeMillis=68
              logError=0
              logFatal=0
              logInfo=52
              logWarn=0
              memHeapCommittedM=7.4375
              memHeapUsedM=4.2150116
              memNonHeapCommittedM=23.1875
              memNonHeapUsedM=18.438614
              threadsBlocked=0
              threadsNew=0
              threadsRunnable=7
              threadsTerminated=0
              threadsTimedWaiting=8
              threadsWaiting=15
        mapred
          job
            {counter=Map input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=2.0
            {counter=Map output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=4.0
            {counter=Data-local map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=4.0
            {counter=Map input bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=48.0
            {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=148.0
            {counter=Combine output records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=0.0
            {counter=Launched map tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=4.0
            {counter=HDFS_BYTES_READ, group=FileSystemCounters, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=236.0
            {counter=Map output bytes, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=64.0
            {counter=Launched reduce tasks, group=Job Counters , hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=1.0
            {counter=Spilled Records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=4.0
            {counter=Combine input records, group=Map-Reduce Framework, hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
              value=0.0
          jobtracker
            {hostName=doorstop.local, sessionId=}
              jobs_completed=0
              jobs_submitted=1
              maps_completed=2
              maps_launched=5
              reduces_completed=0
              reduces_launched=1
        rpc
          metrics
            {hostName=doorstop.local, port=50030}
              NumOpenConnections=2
              RpcProcessingTime_avg_time=0
              RpcProcessingTime_num_ops=84
              RpcQueueTime_avg_time=1
              RpcQueueTime_num_ops=84
              callQueueLen=0
              getBuildVersion_avg_time=0
              getBuildVersion_num_ops=1
              getJobProfile_avg_time=0
              getJobProfile_num_ops=17
              getJobStatus_avg_time=0
              getJobStatus_num_ops=32
              getNewJobId_avg_time=0
              getNewJobId_num_ops=1
              getProtocolVersion_avg_time=0
              getProtocolVersion_num_ops=2
              getSystemDir_avg_time=0
              getSystemDir_num_ops=2
              getTaskCompletionEvents_avg_time=0
              getTaskCompletionEvents_num_ops=19
              heartbeat_avg_time=5
              heartbeat_num_ops=9
              submitJob_avg_time=0
              submitJob_num_ops=1
        {noformat}
        Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON.
        Philip Zeyliger made changes -
        Attachment HADOOP-5469.patch [ 12404254 ]
        Philip Zeyliger made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Philip Zeyliger made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Philip Zeyliger made changes -
        Remaining Estimate 1.5h [ 5400 ]
        Time Spent 2h [ 7200 ]
        Owen O'Malley made changes -
        Assignee Philip Zeyliger [ philip ]
        Doug Cutting made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.21.0 [ 12313563 ]
        Resolution Fixed [ 1 ]
        Chris Douglas made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Doug Cutting made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Robert Chansler made changes -
        Release Note New server web page .../metrics allows convenient access to metrics data via JSON and text.
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Philip Zeyliger
            Reporter:
            Philip Zeyliger
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Time Spent - 2h Remaining Estimate - 1.5h
              1.5h
              Logged:
              Time Spent - 2h Remaining Estimate - 1.5h
              2h

                Development