Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.5.0
    • Labels:
      None

      Description

      We should have an "s4 status" command that provides nicely formatted information about the state of S4 application coordinated through a given Zookeeper ensemble.

      In particular:

      • which applications are deployed
      • on which S4 nodes these applications are deployed
      • how many S4 nodes are running, how many S4 nodes are in standby
      • which streams are published, which applications are connected

      The implementation would probably add a new class in the s4-tools project.

        Activity

        Hide
        Aimee Cheng added a comment -

        For monitoring, the metrics considered to be provided as follow:

        basic metrics (APP/PE/PE instance):

        • event rate
        • event processing time
        • event queue size
        • number of PE instances
        • number of processed events ( maybe within a sliding window)
        • exceptions during event processing
        • actor nodes

        other:

        • APP: PE types
        • PE: keys
        • PE instance: key, TTL, survival time and other informations.

        We'd like to use Codehale's Metrics(http://metrics.codahale.com/) to measure these metrics in S4. By Metrics, the results are visible via JMX, HTTP, and also can be easily exported to many other monitoring tools.

        Show
        Aimee Cheng added a comment - For monitoring, the metrics considered to be provided as follow: basic metrics (APP/PE/PE instance): event rate event processing time event queue size number of PE instances number of processed events ( maybe within a sliding window) exceptions during event processing actor nodes other: APP: PE types PE: keys PE instance: key, TTL, survival time and other informations. We'd like to use Codehale's Metrics( http://metrics.codahale.com/ ) to measure these metrics in S4. By Metrics, the results are visible via JMX, HTTP, and also can be easily exported to many other monitoring tools.
        Hide
        Matthieu Morel added a comment -

        @Aimee : Monitoring metrics deserves a different ticket, in my opinion. Could you create a new one with your proposal?

        The S4 status tools is aimed at showing the status of the S4 clusters, applications and inter-app streams, so that a user can see what's actually running. We need that for the next release, whereas monitoring metrics would be a better fit for a later one (more aimed at performance).

        Thanks!

        Show
        Matthieu Morel added a comment - @Aimee : Monitoring metrics deserves a different ticket, in my opinion. Could you create a new one with your proposal? The S4 status tools is aimed at showing the status of the S4 clusters, applications and inter-app streams, so that a user can see what's actually running. We need that for the next release, whereas monitoring metrics would be a better fit for a later one (more aimed at performance). Thanks!
        Hide
        Matthieu Morel added a comment -

        Also, you should take into account changes from S4-71 for listing the deployed applications per subcluster: that information is now in a single znode instead of a directory

        Show
        Matthieu Morel added a comment - Also, you should take into account changes from S4-71 for listing the deployed applications per subcluster: that information is now in a single znode instead of a directory
        Hide
        Aimee Cheng added a comment -

        I misunderstood the requirement of this ticket. Ok, I'll take a look at S4-71.

        Show
        Aimee Cheng added a comment - I misunderstood the requirement of this ticket. Ok, I'll take a look at S4-71 .
        Hide
        Aimee Cheng added a comment -

        There are some questions I met:

        1. For the third one in description: "how many S4 nodes are running, how many S4 nodes are in standby", it seems the standby nodes didn't register information in Zookeeper, and they only listen the change of znode. So maybe need to add some information to zookeeper, or are there some other ways to get the information of standby nodes?

        2. When publishing streams, I saw the information on consumer/publisher as follow:
        get /s4/streams/names/consumers/consumer-0000000002

        {
        "id": "names/cluster1/-1",
        "simpleFields":

        { "appId": "-1", "clusterName": "cluster1" }

        ,
        "listFields": {},
        "mapFields": {}
        }
        I am confused by the appId in "-1". Though from S4-71 I saw that there would be only 1 app running in a subcluster and I can get the app name when know the cluster name, I want to know the meaning of appId here.

        Show
        Aimee Cheng added a comment - There are some questions I met: 1. For the third one in description: "how many S4 nodes are running, how many S4 nodes are in standby", it seems the standby nodes didn't register information in Zookeeper, and they only listen the change of znode. So maybe need to add some information to zookeeper, or are there some other ways to get the information of standby nodes? 2. When publishing streams, I saw the information on consumer/publisher as follow: get /s4/streams/names/consumers/consumer-0000000002 { "id": "names/cluster1/-1", "simpleFields": { "appId": "-1", "clusterName": "cluster1" } , "listFields": {}, "mapFields": {} } I am confused by the appId in "-1". Though from S4-71 I saw that there would be only 1 app running in a subcluster and I can get the app name when know the cluster name, I want to know the meaning of appId here.
        Hide
        Matthieu Morel added a comment -
        1. Good catch. I'll have a look into that. Did you meet this issue after S4-71 was integrated in the piper branch?
        2. Again, good question. This is actually a leftover from an initial design in which we had multiple apps for a subcluster. It's not the case anymore, and we should remove this. In the meantime, you should ignore it.
        Show
        Matthieu Morel added a comment - Good catch. I'll have a look into that. Did you meet this issue after S4-71 was integrated in the piper branch? Again, good question. This is actually a leftover from an initial design in which we had multiple apps for a subcluster. It's not the case anymore, and we should remove this. In the meantime, you should ignore it.
        Hide
        Aimee Cheng added a comment -

        Yes, I updated to the latest version.

        Show
        Aimee Cheng added a comment - Yes, I updated to the latest version.
        Hide
        Matthieu Morel added a comment -

        About standby nodes info: I checked that this was not a regression. Failover properly works even though standby nodes do not publish information in Zookeeper. It could be an interesting feature to have though. In the meantime, you can probably skip that information until we add it.

        About appId: I created S4-76 and will fix it as soon as I can. Just ignore that id.

        Show
        Matthieu Morel added a comment - About standby nodes info: I checked that this was not a regression. Failover properly works even though standby nodes do not publish information in Zookeeper. It could be an interesting feature to have though. In the meantime, you can probably skip that information until we add it. About appId: I created S4-76 and will fix it as soon as I can. Just ignore that id.
        Hide
        Aimee Cheng added a comment -

        I add status command. It shows two tables, one is the cluster status(Cluster name, App,Tasks,Total nodes,Running nodes) , another is streams status(stream name, producers,consumers).
        There still exists some problems. For example, the apps which have the same name but running in different subclusters, should we consider it as a same app? Now when showing the applications connect to a stream, I use the format like this: cluster1(myAPP). Maybe that's not good.

        Show
        Aimee Cheng added a comment - I add status command. It shows two tables, one is the cluster status(Cluster name, App,Tasks,Total nodes,Running nodes) , another is streams status(stream name, producers,consumers). There still exists some problems. For example, the apps which have the same name but running in different subclusters, should we consider it as a same app? Now when showing the applications connect to a stream, I use the format like this: cluster1(myAPP). Maybe that's not good.
        Hide
        Aimee Cheng added a comment -

        Also, the Adapter doesn't have information in /s4/cluster/clusterX/app/, so it seems difficult to get the information(App names) of such Adapter publishers of streams.

        Show
        Aimee Cheng added a comment - Also, the Adapter doesn't have information in /s4/cluster/clusterX/app/, so it seems difficult to get the information(App names) of such Adapter publishers of streams.
        Hide
        Matthieu Morel added a comment -

        Very nice contribution! Furthermore perfectly integrated with the existing codebase patterns.

        Now when showing the applications connect to a stream, I use the format like this: cluster1(myAPP).

        I think this is fine for now

        the Adapter doesn't have information in /s4/cluster/clusterX/app/, so it seems difficult to get the information(App names) of such Adapter publishers of streams.

        I think you are referring to the adapter that you start directly as in the walkthrough. Which can be viewed as a facility for testing. In a real deployment, the adapter app would appear in Zookeeper as any other S4 app.

        Comments/suggestions:

        • Total nodes is the current number of active nodes right?
        • I understand the format of the "Running nodes" column, but it could be useful to have a small description of the format. Not sure where to show that though. In the column label?
        • It would be worth having another table focusing on Apps. In particular, showing Apps metadata. So far there is only the URI, but it would be useful to see it here. Columns could be "name, cluster, URI, (other metadata columns when available)"

        What do you think?

        Thanks!

        Show
        Matthieu Morel added a comment - Very nice contribution! Furthermore perfectly integrated with the existing codebase patterns. Now when showing the applications connect to a stream, I use the format like this: cluster1(myAPP). I think this is fine for now the Adapter doesn't have information in /s4/cluster/clusterX/app/, so it seems difficult to get the information(App names) of such Adapter publishers of streams. I think you are referring to the adapter that you start directly as in the walkthrough. Which can be viewed as a facility for testing. In a real deployment, the adapter app would appear in Zookeeper as any other S4 app. Comments/suggestions: Total nodes is the current number of active nodes right? I understand the format of the "Running nodes" column, but it could be useful to have a small description of the format. Not sure where to show that though. In the column label? It would be worth having another table focusing on Apps. In particular, showing Apps metadata. So far there is only the URI, but it would be useful to see it here. Columns could be "name, cluster, URI, (other metadata columns when available)" What do you think? Thanks!
        Hide
        Aimee Cheng added a comment - - edited

        Total nodes is the current number of active nodes right?

        Yes, now is the number of active nodes. I used to think total nodes including active nodes and standby nodes. But now the total number is active nodes. I'll change the description for the current version.

        I'll follow the last two comments to update my code. Very nice suggestions!

        Thanks.

        Show
        Aimee Cheng added a comment - - edited Total nodes is the current number of active nodes right? Yes, now is the number of active nodes. I used to think total nodes including active nodes and standby nodes. But now the total number is active nodes. I'll change the description for the current version. I'll follow the last two comments to update my code. Very nice suggestions! Thanks.
        Hide
        Aimee Cheng added a comment -

        I updated the code, and the main changes are adding detail description for active node and adding a app table. The sizes of some columns are also altered.

        Show
        Aimee Cheng added a comment - I updated the code, and the main changes are adding detail description for active node and adding a app table. The sizes of some columns are also altered.
        Hide
        Matthieu Morel added a comment -

        Brilliant, exactly what we needed!

        However I wonder if you could update the patch with the following:

        • it seems you are using tabs instead of spaces, which causes trouble in diffs. See the 2 last sections in http://incubator.apache.org/s4/contrib/
        • small suggestions for the command help, in order to clarify that app,stream and cluster correspond to filters:
              -app           Only show status of specified S4 application(s)
              -c, -cluster   Only show status of specified S4 cluster(s)
              -gradleOpts    gradle system properties (as in GRADLE_OPTS environment
                             properties) passed to gradle scripts
                             Default: []
              -help          usage
                             Default: false
              -s, -stream    Only show status of specified published stream(s)

        Many thanks!

        Show
        Matthieu Morel added a comment - Brilliant, exactly what we needed! However I wonder if you could update the patch with the following: it seems you are using tabs instead of spaces, which causes trouble in diffs. See the 2 last sections in http://incubator.apache.org/s4/contrib/ small suggestions for the command help, in order to clarify that app,stream and cluster correspond to filters: -app Only show status of specified S4 application(s) -c, -cluster Only show status of specified S4 cluster(s) -gradleOpts gradle system properties (as in GRADLE_OPTS environment properties) passed to gradle scripts Default: [] -help usage Default: false -s, -stream Only show status of specified published stream(s) Many thanks!
        Hide
        Aimee Cheng added a comment -

        Thanks! I updated the patch on your suggestion. Hope it works now.

        Show
        Aimee Cheng added a comment - Thanks! I updated the patch on your suggestion. Hope it works now.
        Show
        Matthieu Morel added a comment - Merged in piper branch https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=commit;h=ecbfd429d4eb9336ee5fa1481a7dc634fab7e2a1 Thanks Aimee!

          People

          • Assignee:
            Aimee Cheng
            Reporter:
            Matthieu Morel
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development