Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Data Collection
    • Labels:
      None
    • Release Note:
      REST API for the Agent. Supports CRUD operations for Adaptors, as well as Adaptor data flow statistics.

      Description

      Develop a REST interface for the Agent to expose Adaptor CRUD operations.

      • Request URI:
        GET /rest/v1/adaptor HTTP/1.0
        
      • For now I'm shooting for the XML and plain text response, but ultimately we can support optional params:
        viewType=[json|xml|text] (default is XML?)
        

      I'm planning on using Jetty similar to how the collector does. We could have some common code that delegates requests to different handlers based on the URI. The current telnet interface will remain as-is.

      1. CHUKWA-515-1.patch
        51 kB
        Bill Graham
      2. CHUKWA-515-2.patch
        52 kB
        Bill Graham
      3. CHUKWA-515-3.patch
        53 kB
        Bill Graham
      4. CHUKWA-515-4.patch
        49 kB
        Bill Graham

        Activity

        Hide
        Bill Graham added a comment -

        Committed.

        Show
        Bill Graham added a comment - Committed.
        Hide
        Bill Graham added a comment -

        Here's patch 4, which uses Jersey. The only functional change from the last one, is that the add command now takes an optional viewType querystring just like the get command. The default for both add and get is to return XML, but text is also supported. I've also added the ability to inject other user-developed servlets into the REST server (similar to what's done on the collector) by setting the chukwaAgent.http.rest.controller.packages config. This appends to the default org.apache.hadoop.chukwa.datacollection.agent.rest package.

        Show
        Bill Graham added a comment - Here's patch 4, which uses Jersey. The only functional change from the last one, is that the add command now takes an optional viewType querystring just like the get command. The default for both add and get is to return XML, but text is also supported. I've also added the ability to inject other user-developed servlets into the REST server (similar to what's done on the collector) by setting the chukwaAgent.http.rest.controller.packages config. This appends to the default org.apache.hadoop.chukwa.datacollection.agent.rest package.
        Hide
        Bill Graham added a comment -

        Looking into refactoring using Jersey I meant to say.

        Show
        Bill Graham added a comment - Looking into refactoring using Jersey I meant to say.
        Hide
        Bill Graham added a comment -

        Attaching patch 3, which uses port 9090, handles content-type better and returns the same XML for a POST (add adaptor) as would be returned by a GET by id.

        Looking into refactoring to use Jetty.

        Show
        Bill Graham added a comment - Attaching patch 3, which uses port 9090, handles content-type better and returns the same XML for a POST (add adaptor) as would be returned by a GET by id. Looking into refactoring to use Jetty.
        Hide
        Eric Yang added a comment -

        Minor issues:

        1. The POST command does not return the adaptor ID. It should return the ID for remote side to track the adaptor.
        2. The AdaptorServlet implemented REST api from scratch. It would be nice if it uses Jersey style of rest api. I ran into a problem with Content-Type: application/json. My Firefox REST Client is adding charset after Content-Type. ie.

        Conten-Type: application/json; charset=UTF-8
        

        AdaptorServlet choke on the charset addition.

        3. Using jersey style REST API, it would be easier to manage rpc data structure versioning, and support multiple serialization format (xml, json, protocol buffer, avro, etc).

        Show
        Eric Yang added a comment - Minor issues: 1. The POST command does not return the adaptor ID. It should return the ID for remote side to track the adaptor. 2. The AdaptorServlet implemented REST api from scratch. It would be nice if it uses Jersey style of rest api. I ran into a problem with Content-Type: application/json. My Firefox REST Client is adding charset after Content-Type. ie. Conten-Type: application/json; charset=UTF-8 AdaptorServlet choke on the charset addition. 3. Using jersey style REST API, it would be easier to manage rpc data structure versioning, and support multiple serialization format (xml, json, protocol buffer, avro, etc).
        Hide
        Eric Yang added a comment -

        SocketTeeWriter is already using port 9094. How about use 9090 as default agent http port?

        Show
        Eric Yang added a comment - SocketTeeWriter is already using port 9094. How about use 9090 as default agent http port?
        Hide
        Bill Graham added a comment -

        Adding CHUKWA-515-2.patch. The last patch was still requiring a Cluster param for POSTs, which wasn't being used. (Let's handle per-adaptor Cluster support in a separate JIRA.).

        Show
        Bill Graham added a comment - Adding CHUKWA-515 -2.patch. The last patch was still requiring a Cluster param for POSTs, which wasn't being used. (Let's handle per-adaptor Cluster support in a separate JIRA.).
        Hide
        Bill Graham added a comment -

        Ok, It sounds like we're on the same page then, since the current implementation can return all adaptors or just one by id.

        Show
        Bill Graham added a comment - Ok, It sounds like we're on the same page then, since the current implementation can return all adaptors or just one by id.
        Hide
        Eric Yang added a comment -

        Use /adaptor is good. I think it would be nice if the return stats can be a list of all adaptors or pass adaptor id to show stats of a single adaptor. I uses hundreds of adaptors, hence it's better if api supports viewing stats of one adaptor.

        Show
        Eric Yang added a comment - Use /adaptor is good. I think it would be nice if the return stats can be a list of all adaptors or pass adaptor id to show stats of a single adaptor. I uses hundreds of adaptors, hence it's better if api supports viewing stats of one adaptor.
        Hide
        Bill Graham added a comment -

        The current patch has

        /rest/v1/adaptor

        for a list of all adaptors with the metadata and stats for each

        and

        /rest/v1/adaptor/[adaptor_id]

        for a single adaptor's metadata and stats. Is this what you're looking for? (The stats are few so I rolled them up into the base /adaptor resource.
        It seemed like overkill to have both /adaptor and /adaptorStats or /adaptor/stats.)

        Are you requesting we change the URI of the resources, or changing what's returned for each of the two being discussed?

        Show
        Bill Graham added a comment - The current patch has /rest/v1/adaptor for a list of all adaptors with the metadata and stats for each and /rest/v1/adaptor/ [adaptor_id] for a single adaptor's metadata and stats. Is this what you're looking for? (The stats are few so I rolled them up into the base /adaptor resource. It seemed like overkill to have both /adaptor and /adaptorStats or /adaptor/stats.) Are you requesting we change the URI of the resources, or changing what's returned for each of the two being discussed?
        Hide
        Eric Yang added a comment -

        It would be nice to have "Cluster" configurable through adaptor. However, I don't need this currently. The API looks good. Would it be possible to have

        /rest/v1/adaptor/stats

        for summary

        and

        /rest/v1/adaptor/stats/[adaptor_id]

        for detail stats?

        Show
        Eric Yang added a comment - It would be nice to have "Cluster" configurable through adaptor. However, I don't need this currently. The API looks good. Would it be possible to have /rest/v1/adaptor/stats for summary and /rest/v1/adaptor/stats/ [adaptor_id] for detail stats?
        Hide
        Bill Graham added a comment -

        Attaching CHUKWA-515-1.patch, which has full CRUD support. The read API is as described above. The add and delete APIs can be tested with commands like this:

        # AdaptorParms is optional in the event that the adaptor doesn't take params.
        # Offset is optional and defaults to 0
        curl -d '{ "DataType" : "TestDataType", "AdaptorClass" : "org.apache.hadoop.chukwa.util.ConstRateAdapr", "AdaptorParams" : "1000", "Offset" : "0" }' -H "Content-Type: application/json" http://localhost:9094/rest/v1/adaptor
        
        curl -X DELETE http://localhost:9094/rest/v1/adapto/[adaptor_id]
        

        I've included a generic OffsetStatsManager in org.apache.hadoop.chukwa.datacollection that can be used by anything that tracks data by offset from a given point. This could be useful on the collector if we were to implement a similar collector REST API.

        Let me know if you have any comments about the API, the request format, the response format, variable names, whatever.

        Show
        Bill Graham added a comment - Attaching CHUKWA-515 -1.patch, which has full CRUD support. The read API is as described above. The add and delete APIs can be tested with commands like this: # AdaptorParms is optional in the event that the adaptor doesn't take params. # Offset is optional and defaults to 0 curl -d '{ "DataType" : "TestDataType", "AdaptorClass" : "org.apache.hadoop.chukwa.util.ConstRateAdapr", "AdaptorParams" : "1000", "Offset" : "0" }' -H "Content-Type: application/json" http://localhost:9094/rest/v1/adaptor curl -X DELETE http://localhost:9094/rest/v1/adapto/[adaptor_id] I've included a generic OffsetStatsManager in org.apache.hadoop.chukwa.datacollection that can be used by anything that tracks data by offset from a given point. This could be useful on the collector if we were to implement a similar collector REST API. Let me know if you have any comments about the API, the request format, the response format, variable names, whatever.
        Hide
        Bill Graham added a comment -

        I personally don't need this feature. It was included in Eric's original rough spec in the email chain referenced in CHUKWA-514 which got me thinking about it. Eric, do you need this? I thought it might be convenient to be able to write to a different cluster, but if it's difficult to make that happen, I won't be the one to push for this functionality.

        Show
        Bill Graham added a comment - I personally don't need this feature. It was included in Eric's original rough spec in the email chain referenced in CHUKWA-514 which got me thinking about it. Eric, do you need this? I thought it might be convenient to be able to write to a different cluster, but if it's difficult to make that happen, I won't be the one to push for this functionality.
        Hide
        Ari Rabkin added a comment -

        Uhm. There isn't a clean way to do that. The original vision was that "cluster" was more general than machine and would be not only agent-wide but cluster-wide.

        There's a second problem, which is that right now, Chunk tags can only be added, not updated. This is a Bad Thing, I think. Middling annoying to fix.

        How badly do you need this feature, and why?

        Show
        Ari Rabkin added a comment - Uhm. There isn't a clean way to do that. The original vision was that "cluster" was more general than machine and would be not only agent-wide but cluster-wide. There's a second problem, which is that right now, Chunk tags can only be added, not updated. This is a Bad Thing, I think. Middling annoying to fix. How badly do you need this feature, and why?
        Hide
        Bill Graham added a comment -

        Cool, thanks Ari. I'm close to having a patch with all CRUD operations. The one snag I've hit that you might be able to help with though, is how to add a new adaptor bound to a given "Cluster"? Currently the cluster tag is configured on the Agent for all adaptors, but I can't seem to figure out how to override the default Cluster on a per-adaptor basis.

        Show
        Bill Graham added a comment - Cool, thanks Ari. I'm close to having a patch with all CRUD operations. The one snag I've hit that you might be able to help with though, is how to add a new adaptor bound to a given "Cluster"? Currently the cluster tag is configured on the Agent for all adaptors, but I can't seem to figure out how to override the default Cluster on a per-adaptor basis.
        Hide
        Ari Rabkin added a comment -

        This all sounds good.

        Show
        Ari Rabkin added a comment - This all sounds good.
        Hide
        Bill Graham added a comment -

        I'm thinking that the /adaptorStats functionality can be merged into the /adaptor REST resource and we just show the stats with the adaptor. I've implemented the REST GET functionality for all adaptors and a single adaptor by id. The former is shown below (xml and text). The latter would look similar, just without the <Adaptors> element.

        <Response>
          <Adaptors total="1">
            <Adaptor id="adaptor_c14aa68e64bf0f12be76ce91e7f2e20d"
                     dataType="some-data-type" offset="51394422">
              <AdaptorClass>
                org.apache.hadoop.chukwa.datacollection.adaptor.jms.JMSAdaptor
              </AdaptorClass>
              <AdaptorParams>
                some-data-type tcp://jms.host.com:61616 -t queue.name
              </AdaptorParams>
              <AverageRate intervalSeconds="60">17784.66</AverageRate>
              <AverageRate intervalSeconds="300">17733.91</AverageRate>
              <AverageRate intervalSeconds="600">17679.28</AverageRate>
            </Adaptor>
          </Adaptors>
        </Response>
        

        For the text view, I'm using YAML:

        adaptor_count: 1
        adaptors: 
          - adaptor_id: adaptor_c14aa68e64bf0f12be76ce91e7f2e20d
            data_type: some-data-type
            offset: 51355632
            adaptor_class: org.apache.hadoop.chukwa.datacollection.adaptor.jms.JMSAdaptor
            adaptor_params: some-data-type tcp://jms.host.com:61616 -t queue.name
            average_rates: 
              - rate: 17784.66
                interval: 60
              - rate: 17733.91
                interval: 300
              - rate: 17679.28
                interval: 600
        

        I implemented a timer that takes snapshots every 10 seconds and saves up to 15 minutes worth of data per adaptor. To compute stats I have a StatsManager that requires a recent data point within 0.25*interval (to assure data is not stale) and an older data point within 0.25*interval of the recent data point - interval (to assure adequate history).

        Comments?

        Show
        Bill Graham added a comment - I'm thinking that the /adaptorStats functionality can be merged into the /adaptor REST resource and we just show the stats with the adaptor. I've implemented the REST GET functionality for all adaptors and a single adaptor by id. The former is shown below (xml and text). The latter would look similar, just without the <Adaptors> element. <Response> <Adaptors total="1"> <Adaptor id="adaptor_c14aa68e64bf0f12be76ce91e7f2e20d" dataType="some-data-type" offset="51394422"> <AdaptorClass> org.apache.hadoop.chukwa.datacollection.adaptor.jms.JMSAdaptor </AdaptorClass> <AdaptorParams> some-data-type tcp://jms.host.com:61616 -t queue.name </AdaptorParams> <AverageRate intervalSeconds="60">17784.66</AverageRate> <AverageRate intervalSeconds="300">17733.91</AverageRate> <AverageRate intervalSeconds="600">17679.28</AverageRate> </Adaptor> </Adaptors> </Response> For the text view, I'm using YAML: adaptor_count: 1 adaptors: - adaptor_id: adaptor_c14aa68e64bf0f12be76ce91e7f2e20d data_type: some-data-type offset: 51355632 adaptor_class: org.apache.hadoop.chukwa.datacollection.adaptor.jms.JMSAdaptor adaptor_params: some-data-type tcp://jms.host.com:61616 -t queue.name average_rates: - rate: 17784.66 interval: 60 - rate: 17733.91 interval: 300 - rate: 17679.28 interval: 600 I implemented a timer that takes snapshots every 10 seconds and saves up to 15 minutes worth of data per adaptor. To compute stats I have a StatsManager that requires a recent data point within 0.25*interval (to assure data is not stale) and an older data point within 0.25*interval of the recent data point - interval (to assure adequate history). Comments?
        Hide
        Bill Graham added a comment -

        Thanks for the feedback Ari.

        I was also concerned about holding onto to many data points, but was thinking I'd use the write of a checkpoint as my timer, which I thought occurred at a reportCommit invocation. (Maybe that's not the case?) The thought was to not hold more than 5 minutes worth, which would be 60 data points per adaptor if checkpoints happened every 5 seconds. Do the checkpoints happen at a constant interval? If not, or if it's too frequent, I could set up a timer as you suggest. This would allow me to go back longer in time and save less frequently perhaps.

        Show
        Bill Graham added a comment - Thanks for the feedback Ari. I was also concerned about holding onto to many data points, but was thinking I'd use the write of a checkpoint as my timer, which I thought occurred at a reportCommit invocation. (Maybe that's not the case?) The thought was to not hold more than 5 minutes worth, which would be 60 data points per adaptor if checkpoints happened every 5 seconds. Do the checkpoints happen at a constant interval? If not, or if it's too frequent, I could set up a timer as you suggest. This would allow me to go back longer in time and save less frequently perhaps.
        Hide
        Ari Rabkin added a comment -

        Bill – I see one problem with your approach. If an adaptor puts out a large number of chunks, your approach requires storing a potentially significant amount of data.

        It's a pretty easy fix. Instead of caching a timestamp+offset on every commit or every send, you could a timer go off every N seconds, and cache a timestamp-offset pair for each adaptor only on the timer tick. This way, the memory required is constant per adaptor. The tradeoff is that the average will be very slightly stale, but you can put a bound on how stale, by picking a suitably low value for N.

        Show
        Ari Rabkin added a comment - Bill – I see one problem with your approach. If an adaptor puts out a large number of chunks, your approach requires storing a potentially significant amount of data. It's a pretty easy fix. Instead of caching a timestamp+offset on every commit or every send, you could a timer go off every N seconds, and cache a timestamp-offset pair for each adaptor only on the timer tick. This way, the memory required is constant per adaptor. The tradeoff is that the average will be very slightly stale, but you can put a bound on how stale, by picking a suitably low value for N.
        Hide
        Bill Graham added a comment -

        I'd like to return the 1 minute and 5 minute average data rates for each adaptor.

        To do this I was planning on saving off the last 5 minutes worth of timestamp/offset pairs for each adaptor, so stats could be generated. This would be done in the ChukwaAgent in the reportCommit method. Let me know if you see problems with this approach, or if there's a better way to go about this.

        Show
        Bill Graham added a comment - I'd like to return the 1 minute and 5 minute average data rates for each adaptor. To do this I was planning on saving off the last 5 minutes worth of timestamp/offset pairs for each adaptor, so stats could be generated. This would be done in the ChukwaAgent in the reportCommit method. Let me know if you see problems with this approach, or if there's a better way to go about this.
        Hide
        Bill Graham added a comment -

        Just noticed that my REST resource might not be entirely accurate. If these are stats of the agent as a whole (as opposed to stats broken our for each adaptor), then we should use /rest/v1/agentStats instead.

        Show
        Bill Graham added a comment - Just noticed that my REST resource might not be entirely accurate. If these are stats of the agent as a whole (as opposed to stats broken our for each adaptor), then we should use /rest/v1/agentStats instead.

          People

          • Assignee:
            Bill Graham
            Reporter:
            Bill Graham
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development