Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      Once KAFKA-499 is checked in, every controller to broker communication can be modelled as a state change for one or more partitions. Every state change request will carry the controller epoch. If there is a problem with the state of some partitions, it will be good to have a tool that can create a timeline of requested and completed state changes. This will require each broker to output a state change log that has entries like

      [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1
      [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1

      On controller, this will look like -

      [2012-09-10 10:06:17,198] controller 2, epoch 1, initiated state change request LeaderAndIsr() for partition [foo, 0]

      We need a tool that can collect the state change log from all brokers and create a per-partition timeline of state changes -

      [foo, 0]

      [2012-09-10 10:06:17,198] controller 2, epoch 1 initiated state change request LeaderAndIsr()
      [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() from controller 2, epoch 1
      [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() from controller 2, epoch 1

      This JIRA involves adding the state change log to each broker and adding the tool to create the timeline

      1. kafka-513-v1.patch
        39 kB
        Swapnil Ghike
      2. kafka-513-v2.patch
        37 kB
        Swapnil Ghike
      3. kafka-513-v3.patch
        40 kB
        Swapnil Ghike
      4. kafka-513-v4.patch
        68 kB
        Swapnil Ghike
      5. kafka-513-v5.patch
        76 kB
        Swapnil Ghike
      6. kafka-513-v5-corrected.patch
        76 kB
        Swapnil Ghike

        Activity

        Hide
        Swapnil Ghike added a comment -

        Thanks for making the corrections Neha.

        Jun: Perhaps we should make these changes in another jira.

        Show
        Swapnil Ghike added a comment - Thanks for making the corrections Neha. Jun: Perhaps we should make these changes in another jira.
        Hide
        Jun Rao added a comment -

        Thanks for the patch. A couple of more comments:

        50. We added controllerId in LeaderAndIsrRequest. Should we add that to all requests sent by the controller, such as StopReplicaRequest?

        51. The string "state.change.logger" is used in many places. Should we create a constant val and reuse the val to avoid human mistakes?

        Show
        Jun Rao added a comment - Thanks for the patch. A couple of more comments: 50. We added controllerId in LeaderAndIsrRequest. Should we add that to all requests sent by the controller, such as StopReplicaRequest? 51. The string "state.change.logger" is used in many places. Should we create a constant val and reuse the val to avoid human mistakes?
        Hide
        Neha Narkhede added a comment -

        Thanks for patch v5 Swapnil, the state change log looks great! I checked in that patch after the following minor changes -

        1. Partition
        Fixed the error trace in makeFollower to include correlationId, controllerId and controllerEpoch

        2. PartitionStateMachine
        2.1 In initializeLeaderAndIsrForPartition(),
        Changed NEW -> New
        Changed ONLINE -> Online

        2.2 In electLeaderForPartition(),
        Removed the duplicate "Controller %d epoch %d"

        2.3 Removed the error from getLeaderIsrAndEpochOrThrowException() in state change log since it is already logged in the catch block of electLeaderForPartition()

        2.4 Changed trace() to error() wherever required

        3. ReplicaStateMachine
        Included an error statement in the state change log

        Show
        Neha Narkhede added a comment - Thanks for patch v5 Swapnil, the state change log looks great! I checked in that patch after the following minor changes - 1. Partition Fixed the error trace in makeFollower to include correlationId, controllerId and controllerEpoch 2. PartitionStateMachine 2.1 In initializeLeaderAndIsrForPartition(), Changed NEW -> New Changed ONLINE -> Online 2.2 In electLeaderForPartition(), Removed the duplicate "Controller %d epoch %d" 2.3 Removed the error from getLeaderIsrAndEpochOrThrowException() in state change log since it is already logged in the catch block of electLeaderForPartition() 2.4 Changed trace() to error() wherever required 3. ReplicaStateMachine Included an error statement in the state change log
        Hide
        Swapnil Ghike added a comment -

        Does it work now?

        Show
        Swapnil Ghike added a comment - Does it work now?
        Hide
        Neha Narkhede added a comment -

        I think there is some issue with patch v5. I get the following error while trying to apply the patch -

        nnarkhed-mn:kafka-git nnarkhed$ patch -p1 -i ~/Projects/kafka-patches/kafka-513-v5.patch
        patching file bin/kafka-run-class.sh
        patching file config/log4j.properties
        patch: **** malformed patch at line 39: diff --git a/core/src/main/scala/kafka/api/FetchRequest.scala b/core/src/main/scala/kafka/api/FetchRequest.scala

        Show
        Neha Narkhede added a comment - I think there is some issue with patch v5. I get the following error while trying to apply the patch - nnarkhed-mn:kafka-git nnarkhed$ patch -p1 -i ~/Projects/kafka-patches/kafka-513-v5.patch patching file bin/kafka-run-class.sh patching file config/log4j.properties patch: **** malformed patch at line 39: diff --git a/core/src/main/scala/kafka/api/FetchRequest.scala b/core/src/main/scala/kafka/api/FetchRequest.scala
        Hide
        Swapnil Ghike added a comment -

        Thank you for the detailed suggestions, they were incredibly helpful. Made changes according to last review.

        Additional comments:
        1.1 For this, the constructors of ControllerChannelManagaer, RequestToSendThread have to accept a ControllerContext.

        4, 5. Yes, it makes sense to get rid of the logIdent for state change logs. I have removed the wrapper around the Logging trait as well.

        7.1 Earlier I was thinking that the list of files should be appended if the user specified both options together. But you are right, it may be confusing and anyways the user can include more files using the regex option.

        Show
        Swapnil Ghike added a comment - Thank you for the detailed suggestions, they were incredibly helpful. Made changes according to last review. Additional comments: 1.1 For this, the constructors of ControllerChannelManagaer, RequestToSendThread have to accept a ControllerContext. 4, 5. Yes, it makes sense to get rid of the logIdent for state change logs. I have removed the wrapper around the Logging trait as well. 7.1 Earlier I was thinking that the list of files should be appended if the user specified both options together. But you are right, it may be confusing and anyways the user can include more files using the regex option.
        Hide
        Neha Narkhede added a comment -

        Thanks for the patch! This patch is great and is very close to being checked in. Few review suggestions -

        1. ControllerChannelManager
        1.1 The log message when the controller sends the data includes the controller's epoch, but the response doesn't. It is useful to know the epoch wherever the controller id is logged.
        1.2 The "," between controller id and epoch probably could be skipped, this pattern is not followed elsewhere in this patch.

        2. Partition
        2.1 The error statement when the broker is dropping the leaderAndIsr request should be -

        "since local leader epoch %d is >= the request's leader epoch"

        2.2 Also, let's add "Broker %d aborted..." to the following statement similar to "Broker %d discarded..."

        "Aborted the become-follower state change since leader %d for partition [%s,%d]"

        3. PartitionStateMachine
        In electLeaderForPartition() API, it is useful to include the contents of the StateChangedFailedException in the state change log. This is because it is useful to know that after starting the state change, it got aborted because another controller with a higher controller epoch was detected. So something like -
        "Controller %d epoch %d aborted leader election for partition [%s,%d] since ..."

        4. ReplicaManager

        The log statements here are of the form of [Replica Manager on Broker %d]: Handling... This is different from the ones in Partition which are like "Broker %d started become-follower...". I guess the default logIdent, which is meant for the main log, is the cause of this discrepancy. For the purpose of the state change log, we actually don't care if it is Replica Manager or Partition on that broker, it would just suffice to have the simplest statement to communicate the state changes like the ones in Partition except the logIdent.

        5. ReplicaStateMachine
        Same as 4

        6. Rename stateChangeLogMerger.scala to have 1st letter in caps.

        7. StateChangeLogMerger

        This tool looks great now. Thanks for including the review suggestions. Few minor observations -

        7.1 You check if at least one of the two log input options are specified. I think we should also check that at most one of them is specified. In other words, what's the expected behavior if both are speci
        fied ?

        Show
        Neha Narkhede added a comment - Thanks for the patch! This patch is great and is very close to being checked in. Few review suggestions - 1. ControllerChannelManager 1.1 The log message when the controller sends the data includes the controller's epoch, but the response doesn't. It is useful to know the epoch wherever the controller id is logged. 1.2 The "," between controller id and epoch probably could be skipped, this pattern is not followed elsewhere in this patch. 2. Partition 2.1 The error statement when the broker is dropping the leaderAndIsr request should be - "since local leader epoch %d is >= the request's leader epoch" 2.2 Also, let's add "Broker %d aborted..." to the following statement similar to "Broker %d discarded..." "Aborted the become-follower state change since leader %d for partition [%s,%d] " 3. PartitionStateMachine In electLeaderForPartition() API, it is useful to include the contents of the StateChangedFailedException in the state change log. This is because it is useful to know that after starting the state change, it got aborted because another controller with a higher controller epoch was detected. So something like - "Controller %d epoch %d aborted leader election for partition [%s,%d] since ..." 4. ReplicaManager The log statements here are of the form of [Replica Manager on Broker %d] : Handling... This is different from the ones in Partition which are like "Broker %d started become-follower...". I guess the default logIdent, which is meant for the main log, is the cause of this discrepancy. For the purpose of the state change log, we actually don't care if it is Replica Manager or Partition on that broker, it would just suffice to have the simplest statement to communicate the state changes like the ones in Partition except the logIdent. 5. ReplicaStateMachine Same as 4 6. Rename stateChangeLogMerger.scala to have 1st letter in caps. 7. StateChangeLogMerger This tool looks great now. Thanks for including the review suggestions. Few minor observations - 7.1 You check if at least one of the two log input options are specified. I think we should also check that at most one of them is specified. In other words, what's the expected behavior if both are speci fied ?
        Hide
        Swapnil Ghike added a comment -

        Uploading patch v4. The comments on the merge tool are written above. Remaining comments -

        A. Logging in kafka -
        11. tried to make the log format more consistent - used [%s,%d] for printing topicAndPartition, logged correlationId when the logging statement was concerned with leaderAndIsrRequest, printed controllerId and controllerEpoch wherever needed. I have tried to ensure that there are no mistakes in the order of parameters to .format() etc, but it would be helpful if you could also scrutinize them as well.
        12. Changed state change log logging level to trace, except for certain errors.
        13. As you mentioned, it's an excellent idea to mention in ControllerChannelManager whether the leaderAndIsr request is become-leader or become-follower request. Added a change to log that. Added a change to do that on the broker in ReplicaManager.

        {makeLeader,makeFollower}

        .
        14. Included correlationId in LeaderAndIsrRequest.toString
        15. Changed PartitionStateChangeLogger to StateChangeLogger everywhere.
        16. Fixed the mentioned typos.
        17. Added the state change log entries that you suggested.
        18. As discussed offline, kept the wrapper class around Utils.Logging. Providing a logIdent to this class will save us the trouble to specify the broker id in every state change log entry, and it will keep the logging consistent with the regular server logging.
        19. The successful lifecycle of a state change request will look like the following (error/discard/abort messages can be included in this sequence in case of failures) -

        On the controller - Controller %d, epoch %d sending become-leader/follower LeaderAndIsr request with correlationId %d to broker %d for partition [%s,%d]
        On a broker - [Replica Manager on Broker %d]: Handling LeaderAndIsr request correlationId %d received from controller %d epoch %d for partition [%s,%d]
        On the same broker - [Replica Manager on Broker %d]: LeaderAndIsr request correlationId %d received from controller %d epoch %d starting the become-leader/follower transition for partition [%s,%d]
        On the same broker - [Replica Manager on Broker %d]: Completed become-leader/follower transition for partition [%s,%d]
        On the same broker - [Replica Manager on Broker %d]: Handled LeaderAndIsr request correlationId %d received from controller %d epoch %d for partition [%s,%d]
        On the controller - Controller %d received response correlationId %d for a request sent to broker %d

        I think KafkaApis is not the right place to include a "received leaderAndIsr request" log entry. since KafkaApis handles all types of requests. We should rather expect to directly see a "Handling LeaderAndIsr request" log entry. On the broker, we don't know whether the received leaderAndIsr request is a become-leader or become-follower request, but I guess it's ok since we log that information in the second statement as seen above and while doing that we log correlationId.

        B. I included correlationId in the abstract class RequestOrResponse, probably all its derived classes should include a correlationId.

        C. config/log4j.properties now uses a separate controller.log and a separate state-change.log.

        D. Our kafka-run-class.sh script removes quotes passed to the command line arguments. Changed it so that the quotes can be passed as such. It's useful for passing values containing whitespaces like "2013-03-01 16:03:43,093".

        Show
        Swapnil Ghike added a comment - Uploading patch v4. The comments on the merge tool are written above. Remaining comments - A. Logging in kafka - 11. tried to make the log format more consistent - used [%s,%d] for printing topicAndPartition, logged correlationId when the logging statement was concerned with leaderAndIsrRequest, printed controllerId and controllerEpoch wherever needed. I have tried to ensure that there are no mistakes in the order of parameters to .format() etc, but it would be helpful if you could also scrutinize them as well. 12. Changed state change log logging level to trace, except for certain errors. 13. As you mentioned, it's an excellent idea to mention in ControllerChannelManager whether the leaderAndIsr request is become-leader or become-follower request. Added a change to log that. Added a change to do that on the broker in ReplicaManager. {makeLeader,makeFollower} . 14. Included correlationId in LeaderAndIsrRequest.toString 15. Changed PartitionStateChangeLogger to StateChangeLogger everywhere. 16. Fixed the mentioned typos. 17. Added the state change log entries that you suggested. 18. As discussed offline, kept the wrapper class around Utils.Logging. Providing a logIdent to this class will save us the trouble to specify the broker id in every state change log entry, and it will keep the logging consistent with the regular server logging. 19. The successful lifecycle of a state change request will look like the following (error/discard/abort messages can be included in this sequence in case of failures) - On the controller - Controller %d, epoch %d sending become-leader/follower LeaderAndIsr request with correlationId %d to broker %d for partition [%s,%d] On a broker - [Replica Manager on Broker %d] : Handling LeaderAndIsr request correlationId %d received from controller %d epoch %d for partition [%s,%d] On the same broker - [Replica Manager on Broker %d] : LeaderAndIsr request correlationId %d received from controller %d epoch %d starting the become-leader/follower transition for partition [%s,%d] On the same broker - [Replica Manager on Broker %d] : Completed become-leader/follower transition for partition [%s,%d] On the same broker - [Replica Manager on Broker %d] : Handled LeaderAndIsr request correlationId %d received from controller %d epoch %d for partition [%s,%d] On the controller - Controller %d received response correlationId %d for a request sent to broker %d I think KafkaApis is not the right place to include a "received leaderAndIsr request" log entry. since KafkaApis handles all types of requests. We should rather expect to directly see a "Handling LeaderAndIsr request" log entry. On the broker, we don't know whether the received leaderAndIsr request is a become-leader or become-follower request, but I guess it's ok since we log that information in the second statement as seen above and while doing that we log correlationId. B. I included correlationId in the abstract class RequestOrResponse, probably all its derived classes should include a correlationId. C. config/log4j.properties now uses a separate controller.log and a separate state-change.log. D. Our kafka-run-class.sh script removes quotes passed to the command line arguments. Changed it so that the quotes can be passed as such. It's useful for passing values containing whitespaces like "2013-03-01 16:03:43,093".
        Hide
        Swapnil Ghike added a comment - - edited

        There are many changes, the non-tool part of the patch is not complete yet. I will first make some comments on the tool that I have written up and give some examples below. Let me know if you want to make any changes to the tool's input/output:

        8.1 Yes
        8.2 Using dash now.
        8.3 If we provide a directory as input, then the question would be whether 1. the user needs to create this directory 2. the tool looks for certain file names
        I think we could avoid these issues by providing two options: 1. comma separated list of log files 2. log file name regex (the user would be able to specify full path of the file name regex). Examples below.

        8.4 Ok. The merge tool will look in a time interval closed at both the start time and end time specified.

        8.5 Cool. The merge tool accepts exactly one topic in arguments.
        i. If only the topic is specified, the tool will look for log entries of all its partitions.
        ii. The tool will also accept a comma separated list of partitions. When this list is specified, the topic must be specified, and the tool will look for only those entries for topic and partitions specified.
        iii. If no topic is specified in arguments, the merge tool will merge log entries of all topics.

        8.6
        i. Implemented n-way merge with output buffer of size 1MB.
        ii. The tool merges logs while maintaining chronological order. So if the tool is used to merge logs of multiple partitions or multiple topics, the output will only be ordered wrt time, but will be jumbled up wrt partitions or topics. As discussed offline, in this case, the user is expected to perform additional screening of the output.
        iii. Of course, if the user specifies only one topic and one partition, the output will not appear jumbled.

        Exmaples:

        ~/kafka/kafka/bin$ ./kafka-run-class.sh kafka.tools.stateChangeLogMerger --logs-regex ../state-change.log* --topic testfoo --partitions 0 --start-time "2013-02-28 13:41:35,891" --end-time "2013-02-28 13:41:35,959"
        [2013-02-28 13:41:35,891] INFO [Partition state machine on Controller 0]: Elected leader 0 for Offline partition [testfoo, 0] (partitionStateChangeLogger)
        [2013-02-28 13:41:35,892] INFO [Partition state machine on Controller 0]: Partition [testfoo, 0] state changed from OnlinePartition to OnlinePartition with leader 0 (partitionStateChangeLogger)
        [2013-02-28 13:41:35,900] DEBUG Controller 0, epoch 9 sending LeaderAndIsr request with correlationId 1to broker 0 for partition [testfoo,0] (partitionStateChangeLogger)
        [2013-02-28 13:41:35,923] INFO [Replica state machine on Controller 0]: Replica 0 for partition [testfoo, 0] state changed to OnlineReplica (partitionStateChangeLogger)
        [2013-02-28 13:41:35,924] DEBUG Controller 0, epoch 9 sending LeaderAndIsr request with correlationId 2to broker 0 for partition [testfoo,0] (partitionStateChangeLogger)
        [2013-02-28 13:41:35,958] INFO [Replica Manager on Broker 0]: Received LeaderAndIsr request from controller 0, epoch 2, starting leader state transition for partition [testfoo, 0] (partitionStateChangeLogger)
        [2013-02-28 13:41:35,959] INFO [Replica Manager on Broker 0]: Completed leader state transition for partition [testfoo, 0] (partitionStateChangeLogger)

        ~/kafka/kafka/bin$ ./kafka-run-class.sh kafka.tools.stateChangeLogMerger
        Provide at least one of the two arguments "[logs]" or "[logs-regex]"

        ~/kafka/kafka/bin$ ./kafka-run-class.sh kafka.tools.stateChangeLogMerger --logs ../state-change.log --partitions 0,1
        "[topic]" required with partition ids

        Show
        Swapnil Ghike added a comment - - edited There are many changes, the non-tool part of the patch is not complete yet. I will first make some comments on the tool that I have written up and give some examples below. Let me know if you want to make any changes to the tool's input/output: 8.1 Yes 8.2 Using dash now. 8.3 If we provide a directory as input, then the question would be whether 1. the user needs to create this directory 2. the tool looks for certain file names I think we could avoid these issues by providing two options: 1. comma separated list of log files 2. log file name regex (the user would be able to specify full path of the file name regex). Examples below. 8.4 Ok. The merge tool will look in a time interval closed at both the start time and end time specified. 8.5 Cool. The merge tool accepts exactly one topic in arguments. i. If only the topic is specified, the tool will look for log entries of all its partitions. ii. The tool will also accept a comma separated list of partitions. When this list is specified, the topic must be specified, and the tool will look for only those entries for topic and partitions specified. iii. If no topic is specified in arguments, the merge tool will merge log entries of all topics. 8.6 i. Implemented n-way merge with output buffer of size 1MB. ii. The tool merges logs while maintaining chronological order. So if the tool is used to merge logs of multiple partitions or multiple topics, the output will only be ordered wrt time, but will be jumbled up wrt partitions or topics. As discussed offline, in this case, the user is expected to perform additional screening of the output. iii. Of course, if the user specifies only one topic and one partition, the output will not appear jumbled. Exmaples: ~/kafka/kafka/bin$ ./kafka-run-class.sh kafka.tools.stateChangeLogMerger --logs-regex ../state-change.log* --topic testfoo --partitions 0 --start-time "2013-02-28 13:41:35,891" --end-time "2013-02-28 13:41:35,959" [2013-02-28 13:41:35,891] INFO [Partition state machine on Controller 0] : Elected leader 0 for Offline partition [testfoo, 0] (partitionStateChangeLogger) [2013-02-28 13:41:35,892] INFO [Partition state machine on Controller 0] : Partition [testfoo, 0] state changed from OnlinePartition to OnlinePartition with leader 0 (partitionStateChangeLogger) [2013-02-28 13:41:35,900] DEBUG Controller 0, epoch 9 sending LeaderAndIsr request with correlationId 1to broker 0 for partition [testfoo,0] (partitionStateChangeLogger) [2013-02-28 13:41:35,923] INFO [Replica state machine on Controller 0] : Replica 0 for partition [testfoo, 0] state changed to OnlineReplica (partitionStateChangeLogger) [2013-02-28 13:41:35,924] DEBUG Controller 0, epoch 9 sending LeaderAndIsr request with correlationId 2to broker 0 for partition [testfoo,0] (partitionStateChangeLogger) [2013-02-28 13:41:35,958] INFO [Replica Manager on Broker 0] : Received LeaderAndIsr request from controller 0, epoch 2, starting leader state transition for partition [testfoo, 0] (partitionStateChangeLogger) [2013-02-28 13:41:35,959] INFO [Replica Manager on Broker 0] : Completed leader state transition for partition [testfoo, 0] (partitionStateChangeLogger) ~/kafka/kafka/bin$ ./kafka-run-class.sh kafka.tools.stateChangeLogMerger Provide at least one of the two arguments " [logs] " or " [logs-regex] " ~/kafka/kafka/bin$ ./kafka-run-class.sh kafka.tools.stateChangeLogMerger --logs ../state-change.log --partitions 0,1 " [topic] " required with partition ids
        Hide
        Swapnil Ghike added a comment -

        Um just curious, did you look at my comments while uploading patch v2? https://issues.apache.org/jira/browse/KAFKA-513?focusedCommentId=13576308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13576308

        It addresses some of the issues that you mentioned in points 8, 9 etc (which you had mentioned while reviewing v1 earlier).

        I will get started on your other comments.

        Show
        Swapnil Ghike added a comment - Um just curious, did you look at my comments while uploading patch v2? https://issues.apache.org/jira/browse/KAFKA-513?focusedCommentId=13576308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13576308 It addresses some of the issues that you mentioned in points 8, 9 etc (which you had mentioned while reviewing v1 earlier). I will get started on your other comments.
        Hide
        Neha Narkhede added a comment -

        Thanks for patch v3. Overall, its pretty good, a few review comments -

        Let's standardize on the format for printing a partition. The reason I suggested we follow [%s,%d] is because most of the code uses TopicPartition and that prints partition as [%s,%d]. Saw that you changed that to add a space and we also had started with that, but after finding that most of us tend to grep for "[%s,%d]" instead, we decided to remove that space

        1. ControllerChannelManager
        1.1 Let's include correlation id in the state change log messages. This will make it easier to trace the state change messages through all brokers
        1.2 I think we should include another state change log message once the controller receives a response to the state change request. We can print correlation id here as well
        1.3 In the log message for leader and isr request, I wonder if it's worth specifying if it is either become-leader or become-follower for that particular partition. Same can be done on the broker as well.Otherwise, it still remains a mystery whether the broker was asked to become a leader or a follower. Thoughts ?

        2. LeaderAndIsrRequest
        It's good to add the controller id, should we also change the toString method to print it ?

        3. Partition
        3.1 Can we call the partitionStateChangeLogger just stateChangeLogger for simplicity ?
        3.2 The kind of information you chose to put in the state change log is useful. Can we standardize it a bit, more like on the lines of what's in ControllerChannelManager. It will be nice to know which leader and isr request is being rejected (identified by correlation id). Also which controller with what epoch had sent it ? For example,
        "Broker 1 rejected LeaderAndIsrRequest correlation id 123 from controller 2 epoch 3 for partition [foo,1] as current leader epoch 5 is >= request leader epoch 4"
        3.3 Let's fix the [%s, %d] for printing the partition in makeFollower()

        4. KafkaApis
        We should probably have a "received request" state change messages on the brokers. Probably this can go in KafkaApis. Something like -
        "Broker 1 received LeaderAndIsrRequest correlation id 123 from controller 2 epoch 3 for partition [foo,1]"
        This can be followed by a "handling request" state change message, if there is one and then "handled request" on the broker. Finally, there will be a "completed request" on the controller. This will complete the lifecycle of a state change request.

        5. PartitionStateMachine
        Can we standardize the log messages here as well ?
        "Controller 1 epoch 3 elected leader 4 for partition [foo,1]"

        6. ReplicaManager
        6.1 Like I mentioned above, let's include "handling request" in the state change log as well
        6.2 Typo -> follwer
        6.3 Let's also include the error case when the broker drops the leader and isr request sent by a stale controller epoch inside becomeLeaderOrFollower()
        6.4 Let's standardize on the log message format here as well

        7. ReplicaStateMachine
        Let's standardize on the log message format here as well

        8. PartitionStateChangeLogMerger
        8.1 Can we rename this to just StateChangeLogMerger?
        8.2 We don't use camel case in command line options for our tools. I don't think we've quite standardized on one thing, but we use either '-' or '.' to separate the words in command line options.
        8.3 It's unclear from the description of the command line option what the input to this tool is. Is it a directory that contains state change logs from all servers or is it a csv that points to a list of state change logs ? I think the former will be easier to work with since there could be multiple state change logs per broker (it is usually configured for daily rolling).
        8.4 It does seem useful to provide an optional time range to this tool. That way it doesn't have to unnecessarily merge data from files that don't fit in the query time range. Sometimes, you know the star
        t time, sometimes the end time, sometimes none or both. All of these are useful while troubleshooting
        8.5 It seems like this tool should also take the topic partition as input. Most times, I see us querying this tool for a particular partition. Here it is useful to take in topic and partition as separate
        options. If you specify only topic but not the partition, the tool should get data for all partitions for that topic. It's good that the tool can handle all topic partitions as well.
        8.6 I am not sure why the second map in partitionsMergedLog is keyed on Date ? Also, I have the same suggestion I had earlier to scale this tool. The main reason is that even if a single state change log has fewer entries, I see us using this tool across multiday history of state changes on a Kafka cluster. Typically, when you hit a hard-to-debug issue in production, the last thing you want is for your awesome tool to run out of memory since that is the best option you have to troubleshoot the issue at hand. So I suggest we look into improving the merging process in this tool.
        Currently, the merging algorithm requires streaming all the entries from each broker's state change log in memory. This runs a risk of the tool running out of memory. One way to mediate this is to basically do a n-way merge from m input files. The algorithm is something like this -
        8.6.1 Read a line (that matches the topic/partition regex and date range, if there is one) from every input file in a priority queue
        8.6.2 Take the line from the file with the earliest date and add it to an output buffer
        8.6.3 Add another line from the file in the earlier step in the priority queue
        8.6.4 Flush the output buffer every n entries
        Here n can be set to some reasonable number given the memory usage of a typical line in the state change log (Since we wrote this, we should be able to estimate this)

        9. PartitionStateChangeLogger
        What is the reason to defining this in a separate class ? It seems like a wrapper over the basic getLogger statement.

        10. Few other suggestions on the formatted output of the state change log -

        Let's remove "[Partition state machine on Controller 0]", "[Replica Manager on Broker 0]" part of the log statement. Also remove the (partitionStateChangeLogger) at the end. We also don't need to log if its INFO or DEBUG.

        Show
        Neha Narkhede added a comment - Thanks for patch v3. Overall, its pretty good, a few review comments - Let's standardize on the format for printing a partition. The reason I suggested we follow [%s,%d] is because most of the code uses TopicPartition and that prints partition as [%s,%d] . Saw that you changed that to add a space and we also had started with that, but after finding that most of us tend to grep for " [%s,%d] " instead, we decided to remove that space 1. ControllerChannelManager 1.1 Let's include correlation id in the state change log messages. This will make it easier to trace the state change messages through all brokers 1.2 I think we should include another state change log message once the controller receives a response to the state change request. We can print correlation id here as well 1.3 In the log message for leader and isr request, I wonder if it's worth specifying if it is either become-leader or become-follower for that particular partition. Same can be done on the broker as well.Otherwise, it still remains a mystery whether the broker was asked to become a leader or a follower. Thoughts ? 2. LeaderAndIsrRequest It's good to add the controller id, should we also change the toString method to print it ? 3. Partition 3.1 Can we call the partitionStateChangeLogger just stateChangeLogger for simplicity ? 3.2 The kind of information you chose to put in the state change log is useful. Can we standardize it a bit, more like on the lines of what's in ControllerChannelManager. It will be nice to know which leader and isr request is being rejected (identified by correlation id). Also which controller with what epoch had sent it ? For example, "Broker 1 rejected LeaderAndIsrRequest correlation id 123 from controller 2 epoch 3 for partition [foo,1] as current leader epoch 5 is >= request leader epoch 4" 3.3 Let's fix the [%s, %d] for printing the partition in makeFollower() 4. KafkaApis We should probably have a "received request" state change messages on the brokers. Probably this can go in KafkaApis. Something like - "Broker 1 received LeaderAndIsrRequest correlation id 123 from controller 2 epoch 3 for partition [foo,1] " This can be followed by a "handling request" state change message, if there is one and then "handled request" on the broker. Finally, there will be a "completed request" on the controller. This will complete the lifecycle of a state change request. 5. PartitionStateMachine Can we standardize the log messages here as well ? "Controller 1 epoch 3 elected leader 4 for partition [foo,1] " 6. ReplicaManager 6.1 Like I mentioned above, let's include "handling request" in the state change log as well 6.2 Typo -> follwer 6.3 Let's also include the error case when the broker drops the leader and isr request sent by a stale controller epoch inside becomeLeaderOrFollower() 6.4 Let's standardize on the log message format here as well 7. ReplicaStateMachine Let's standardize on the log message format here as well 8. PartitionStateChangeLogMerger 8.1 Can we rename this to just StateChangeLogMerger? 8.2 We don't use camel case in command line options for our tools. I don't think we've quite standardized on one thing, but we use either '-' or '.' to separate the words in command line options. 8.3 It's unclear from the description of the command line option what the input to this tool is. Is it a directory that contains state change logs from all servers or is it a csv that points to a list of state change logs ? I think the former will be easier to work with since there could be multiple state change logs per broker (it is usually configured for daily rolling). 8.4 It does seem useful to provide an optional time range to this tool. That way it doesn't have to unnecessarily merge data from files that don't fit in the query time range. Sometimes, you know the star t time, sometimes the end time, sometimes none or both. All of these are useful while troubleshooting 8.5 It seems like this tool should also take the topic partition as input. Most times, I see us querying this tool for a particular partition. Here it is useful to take in topic and partition as separate options. If you specify only topic but not the partition, the tool should get data for all partitions for that topic. It's good that the tool can handle all topic partitions as well. 8.6 I am not sure why the second map in partitionsMergedLog is keyed on Date ? Also, I have the same suggestion I had earlier to scale this tool. The main reason is that even if a single state change log has fewer entries, I see us using this tool across multiday history of state changes on a Kafka cluster. Typically, when you hit a hard-to-debug issue in production, the last thing you want is for your awesome tool to run out of memory since that is the best option you have to troubleshoot the issue at hand. So I suggest we look into improving the merging process in this tool. Currently, the merging algorithm requires streaming all the entries from each broker's state change log in memory. This runs a risk of the tool running out of memory. One way to mediate this is to basically do a n-way merge from m input files. The algorithm is something like this - 8.6.1 Read a line (that matches the topic/partition regex and date range, if there is one) from every input file in a priority queue 8.6.2 Take the line from the file with the earliest date and add it to an output buffer 8.6.3 Add another line from the file in the earlier step in the priority queue 8.6.4 Flush the output buffer every n entries Here n can be set to some reasonable number given the memory usage of a typical line in the state change log (Since we wrote this, we should be able to estimate this) 9. PartitionStateChangeLogger What is the reason to defining this in a separate class ? It seems like a wrapper over the basic getLogger statement. 10. Few other suggestions on the formatted output of the state change log - Let's remove " [Partition state machine on Controller 0] ", " [Replica Manager on Broker 0] " part of the log statement. Also remove the (partitionStateChangeLogger) at the end. We also don't need to log if its INFO or DEBUG.
        Hide
        Swapnil Ghike added a comment -

        Rebased. All comments are the same as those on patch v2.

        Show
        Swapnil Ghike added a comment - Rebased. All comments are the same as those on patch v2.
        Hide
        Neha Narkhede added a comment -

        My apologies for reaching this JIRA late, do you mind rebasing I will take a look immediately.

        Show
        Neha Narkhede added a comment - My apologies for reaching this JIRA late, do you mind rebasing I will take a look immediately.
        Hide
        Swapnil Ghike added a comment -

        Patch v2:

        1.1.a Statements other than "abort transition" are logged in ReplicaManager. Also please refer to 3 below.
        1.1.b Logging statements for controller seem to be of the format "controller sent a LeaderAndIsr request to broker %d", pls lmk if I should change them. Also pls refer to 2.1.b below.
        1.2 Will KAFK-649 take care of this? Also, it seems like [topic, partition] is more common in our current code.
        1.3 Made the change.

        2.1.a I have created a separate file controller.log in log4j.properties. Earlier all the controller logging statements were sent to state-change.log
        2.1.b Ignored all trace statements from state-change log. Should the debug statement in ControllerBrokerRequestBatch.sendRequestsToBrokers be in state-change.log? Ignored other debug statements from state-change log.
        2.2 Made the change.

        3 We should probably keep it, since it's nice to have the logIdent identifier at the beginning of each logging statement in state-change.log. Lmk what you think.

        4. Made the change.

        5. Rebase took care of this.

        6.1 Changed the input name to 'stateChangeLog', hopefully the description is also clearer now.
        6.2 I think it's ok if we merge everything together. Grepping for a topic or a partition is straightforward.
        6.3 Hmm, so I tried to merge 6 files each of 11k lines. The total memory consumed did not rise above 200MB. To optimize a bit, I replaced the immutable.TreeMap used from the last patch with mutable.HashMap. These (date --> lines) hashmaps are sorted by converting each to a sequence before printing, sorting should be ok since each of such hashmaps will contain only a handful of entries for a single [topic, partition].

        Show
        Swapnil Ghike added a comment - Patch v2: 1.1.a Statements other than "abort transition" are logged in ReplicaManager. Also please refer to 3 below. 1.1.b Logging statements for controller seem to be of the format "controller sent a LeaderAndIsr request to broker %d", pls lmk if I should change them. Also pls refer to 2.1.b below. 1.2 Will KAFK-649 take care of this? Also, it seems like [topic, partition] is more common in our current code. 1.3 Made the change. 2.1.a I have created a separate file controller.log in log4j.properties. Earlier all the controller logging statements were sent to state-change.log 2.1.b Ignored all trace statements from state-change log. Should the debug statement in ControllerBrokerRequestBatch.sendRequestsToBrokers be in state-change.log? Ignored other debug statements from state-change log. 2.2 Made the change. 3 We should probably keep it, since it's nice to have the logIdent identifier at the beginning of each logging statement in state-change.log. Lmk what you think. 4. Made the change. 5. Rebase took care of this. 6.1 Changed the input name to 'stateChangeLog', hopefully the description is also clearer now. 6.2 I think it's ok if we merge everything together. Grepping for a topic or a partition is straightforward. 6.3 Hmm, so I tried to merge 6 files each of 11k lines. The total memory consumed did not rise above 200MB. To optimize a bit, I replaced the immutable.TreeMap used from the last patch with mutable.HashMap. These (date --> lines) hashmaps are sorted by converting each to a sequence before printing, sorting should be ok since each of such hashmaps will contain only a handful of entries for a single [topic, partition] .
        Hide
        Neha Narkhede added a comment -

        Thanks for patch v1. Overall looks pretty good, a few review suggestions and questions -

        1. Partition.scala
        1.1 The state change log is a timeline of state change requests sent by the controller and state change requests completed by the broker. On the broker side, it will be nice to see something like this -

        Controller 3 sent the follower state transition for [foo,1] to broker 1
        Broker 1 started follower state transition for [foo,1] to leader 2
        Broker 1 completed follower state transition for [foo,1] to leader 2
        Broker 1 aborted follower state transition for [foo,1] to leader 2
        1.2 It will be good to standardize on our logging style for partition. There are some log statements that use [topic, partition], others that use [topic,partition] and some more that use "topic %s partition %d". I personally prefer [topic,partition] since it is concise and does not have extra white spaces.
        1.3 Given that we already have one log statement on the broker stating start of the state transition, I think we can get rid of the following from the state change log
        Started to become follower at the request %s

        2. PartitionStateMachine
        2.1 There are 2 ways I see the log messages from controller being logged. One is a specialized formatted log that is the state change log. This is used by the state change log merge tool to create a timeline of state changes for a particular partition. Another is re-directing log4j statements from kafka.controller to a separate log4j appender. The purpose would be keeping the log statements specific to controller from polluting the broker logs since presumably only Kafka developers or advanced users would be able to make sense of it. Having said that, I think we can keep the debug statements and anything that helps troubleshooting out of the specialized state change log since that will be captured in the controller's log anyway.
        2.2 The leader election state change log messages can be something like
        Controller 3 started leader election for [foo,1]
        Controller 3 completed leader election for [foo,1]

        3. PartitionStateChangeLogger
        Since this is a thin wrapper over the Logging trait, do we really need this class at all ? Or, to maintain consistency, we can just use it as a standard logger similar to the requestLogger in KafkaApis.

        4. ReplicaStateMachine
        Given that we already have the state change message for OfflineReplica, I think we don't need the following message about removing it from the ISR

        5. log4j.properties
        How about calling it just a stateChangeAppender and the log a state_change.log ?

        6. PartitionStateChangeLogMerger
        6.1 It's unclear what the input to this tool is. It will help to give a more detailed description to the input options.
        6.2 The way I see it, there are at least 3 ways to merge the state change logs - Filter all the state changes on a partition or all partitions for a topic or everything. We should probably provide these options as input
        6.3 Currently, the merging algorithm requires streaming all the entries from each broker's state change log in memory. I wonder if this runs a risk of running out of memory. On one hand, I think there should be fewer state change log entries, so this approach should work just fine. On the other hand, if there is a problem, leading to a lot of state change traffic, this tool might not hold up. One way to mediate this is to basically do a n-way merge from m input files. The algorithm is something like this -
        6.3.1 Read a line (that matches the topic/partition regex if there is one) from every input file in a priority queue
        6.3.2 Take the line from the file with the earliest date and add it to an output buffer
        6.3.3 Add another line from the file in the earlier step in the priority queue
        6.3.4 Flush the output buffer every n entries
        This approach will bound the memory required by this tool to the amount of memory required to store roughly n log entries

        Show
        Neha Narkhede added a comment - Thanks for patch v1. Overall looks pretty good, a few review suggestions and questions - 1. Partition.scala 1.1 The state change log is a timeline of state change requests sent by the controller and state change requests completed by the broker. On the broker side, it will be nice to see something like this - Controller 3 sent the follower state transition for [foo,1] to broker 1 Broker 1 started follower state transition for [foo,1] to leader 2 Broker 1 completed follower state transition for [foo,1] to leader 2 Broker 1 aborted follower state transition for [foo,1] to leader 2 1.2 It will be good to standardize on our logging style for partition. There are some log statements that use [topic, partition] , others that use [topic,partition] and some more that use "topic %s partition %d". I personally prefer [topic,partition] since it is concise and does not have extra white spaces. 1.3 Given that we already have one log statement on the broker stating start of the state transition, I think we can get rid of the following from the state change log Started to become follower at the request %s 2. PartitionStateMachine 2.1 There are 2 ways I see the log messages from controller being logged. One is a specialized formatted log that is the state change log. This is used by the state change log merge tool to create a timeline of state changes for a particular partition. Another is re-directing log4j statements from kafka.controller to a separate log4j appender. The purpose would be keeping the log statements specific to controller from polluting the broker logs since presumably only Kafka developers or advanced users would be able to make sense of it. Having said that, I think we can keep the debug statements and anything that helps troubleshooting out of the specialized state change log since that will be captured in the controller's log anyway. 2.2 The leader election state change log messages can be something like Controller 3 started leader election for [foo,1] Controller 3 completed leader election for [foo,1] 3. PartitionStateChangeLogger Since this is a thin wrapper over the Logging trait, do we really need this class at all ? Or, to maintain consistency, we can just use it as a standard logger similar to the requestLogger in KafkaApis. 4. ReplicaStateMachine Given that we already have the state change message for OfflineReplica, I think we don't need the following message about removing it from the ISR 5. log4j.properties How about calling it just a stateChangeAppender and the log a state_change.log ? 6. PartitionStateChangeLogMerger 6.1 It's unclear what the input to this tool is. It will help to give a more detailed description to the input options. 6.2 The way I see it, there are at least 3 ways to merge the state change logs - Filter all the state changes on a partition or all partitions for a topic or everything. We should probably provide these options as input 6.3 Currently, the merging algorithm requires streaming all the entries from each broker's state change log in memory. I wonder if this runs a risk of running out of memory. On one hand, I think there should be fewer state change log entries, so this approach should work just fine. On the other hand, if there is a problem, leading to a lot of state change traffic, this tool might not hold up. One way to mediate this is to basically do a n-way merge from m input files. The algorithm is something like this - 6.3.1 Read a line (that matches the topic/partition regex if there is one) from every input file in a priority queue 6.3.2 Take the line from the file with the earliest date and add it to an output buffer 6.3.3 Add another line from the file in the earlier step in the priority queue 6.3.4 Flush the output buffer every n entries This approach will bound the memory required by this tool to the amount of memory required to store roughly n log entries
        Hide
        Swapnil Ghike added a comment -

        Patch v1:
        1. Created a tool PartitionStateChangeLogMerger

        • It looks for a topicAndPartition in the format [%s, %d] and a data in the format "yyyy-MM-dd HH:mm:ss,SSS".
        • Uses a TreeMap to maintain a log for each topicAndPartition sorted according to date.
        • Run it as ./kafka-run-class.sh kafka.tools.PartitionStateChangeLogMerger --file testfile --file testfile1

        2. Added a class PartitionStateChangeLogger.

        • It uses a separate logger in log4j properties file and a fileAppender.
        • It uses [%d] %p %m%n format.
          eg -
          [2012-12-12 18:27:46,913] DEBUG preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@70922804, name=log4j:logger=partitionStateChangeLogger
          [2012-12-12 18:27:49,995] INFO [Replica state machine on Controller 0]: Replica 0 for partition [foo, 0] state changed to OnlineReplica
          [2012-12-12 18:27:49,995] INFO [Replica state machine on Controller 0]: Replica 0 for partition [too, 0] state changed to OnlineReplica
          [2012-12-12 18:27:50,004] DEBUG Controller 0, epoch 4 sending LeaderAndIsr request to broker 0 for partition [foo, 0]
          [2012-12-12 18:27:50,004] DEBUG Controller 0, epoch 4 sending LeaderAndIsr request to broker 0 for partition [too, 0]
          [2012-12-12 18:27:50,048] INFO [Replica Manager on Broker 0]: Received LeaderAndIsr request from controller 0, epoch 0, becoming Leader for partition [foo, 0]
          [2012-12-12 18:27:50,051] TRACE Partition [foo, 0] on broker 0: Started to become leader at the request { "ISR":"0","leader":"0","leaderEpoch":"0" }

          [2012-12-12 18:27:50,062] INFO [Replica Manager on Broker 0]: Completed the leader state transition for partition [foo, 0]
          [2012-12-12 18:27:50,062] INFO [Replica Manager on Broker 0]: Received LeaderAndIsr request from controller 0, epoch 0, becoming Leader for partition [too, 0]
          [2012-12-12 18:27:50,062] TRACE Partition [too, 0] on broker 0: Started to become leader at the request

          { "ISR":"0","leader":"0","leaderEpoch":"0" }

          [2012-12-12 18:27:50,063] INFO [Replica Manager on Broker 0]: Completed the leader state transition for partition [too, 0]

        I am not sure how to prevent it from printing the first statement.

        3. A bunch of logging statements use PartitionStateChangeLogger in Partition, PartitionStateMachine, ReplicaStateMachine, ReplicaManager, ControllerChannelManager,

        4. Included controllerId in LeaderAndIsrRequest via ControllerBrokerRequestBatch.

        5. Modification to two unit tests because of the change in LeaderAndIsrRequest constructor.

        Testing done -
        Unit tests, Sanity test, merged two test log files.

        What should be the location of the partition state change log file?

        Show
        Swapnil Ghike added a comment - Patch v1: 1. Created a tool PartitionStateChangeLogMerger It looks for a topicAndPartition in the format [%s, %d] and a data in the format "yyyy-MM-dd HH:mm:ss,SSS". Uses a TreeMap to maintain a log for each topicAndPartition sorted according to date. Run it as ./kafka-run-class.sh kafka.tools.PartitionStateChangeLogMerger --file testfile --file testfile1 2. Added a class PartitionStateChangeLogger. It uses a separate logger in log4j properties file and a fileAppender. It uses [%d] %p %m%n format. eg - [2012-12-12 18:27:46,913] DEBUG preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@70922804, name=log4j:logger=partitionStateChangeLogger [2012-12-12 18:27:49,995] INFO [Replica state machine on Controller 0] : Replica 0 for partition [foo, 0] state changed to OnlineReplica [2012-12-12 18:27:49,995] INFO [Replica state machine on Controller 0] : Replica 0 for partition [too, 0] state changed to OnlineReplica [2012-12-12 18:27:50,004] DEBUG Controller 0, epoch 4 sending LeaderAndIsr request to broker 0 for partition [foo, 0] [2012-12-12 18:27:50,004] DEBUG Controller 0, epoch 4 sending LeaderAndIsr request to broker 0 for partition [too, 0] [2012-12-12 18:27:50,048] INFO [Replica Manager on Broker 0] : Received LeaderAndIsr request from controller 0, epoch 0, becoming Leader for partition [foo, 0] [2012-12-12 18:27:50,051] TRACE Partition [foo, 0] on broker 0: Started to become leader at the request { "ISR":"0","leader":"0","leaderEpoch":"0" } [2012-12-12 18:27:50,062] INFO [Replica Manager on Broker 0] : Completed the leader state transition for partition [foo, 0] [2012-12-12 18:27:50,062] INFO [Replica Manager on Broker 0] : Received LeaderAndIsr request from controller 0, epoch 0, becoming Leader for partition [too, 0] [2012-12-12 18:27:50,062] TRACE Partition [too, 0] on broker 0: Started to become leader at the request { "ISR":"0","leader":"0","leaderEpoch":"0" } [2012-12-12 18:27:50,063] INFO [Replica Manager on Broker 0] : Completed the leader state transition for partition [too, 0] I am not sure how to prevent it from printing the first statement. 3. A bunch of logging statements use PartitionStateChangeLogger in Partition, PartitionStateMachine, ReplicaStateMachine, ReplicaManager, ControllerChannelManager, 4. Included controllerId in LeaderAndIsrRequest via ControllerBrokerRequestBatch. 5. Modification to two unit tests because of the change in LeaderAndIsrRequest constructor. Testing done - Unit tests, Sanity test, merged two test log files. What should be the location of the partition state change log file?
        Hide
        Jay Kreps added a comment -

        It would be nice if we updated our log4j.properties as part of this ticket so that this log went to a different log file (and not to console) since this is meant for debugging and will confuse everyone except for Neha . Would probably make it easier to read the state transitions too...

        Show
        Jay Kreps added a comment - It would be nice if we updated our log4j.properties as part of this ticket so that this log went to a different log file (and not to console) since this is meant for debugging and will confuse everyone except for Neha . Would probably make it easier to read the state transitions too...
        Hide
        Neha Narkhede added a comment -

        I think we cannot release without this functionality since it will be a debugging nightmare if we were to hit any leader election bug post 0.8 deployment. I'm performance testing 0.8 right now, and found another bug that randomly fails produce requests because the leader election hasn't completed/started for a certain partition at a certain time. If we had this feature, that effort would be greatly simplified

        Show
        Neha Narkhede added a comment - I think we cannot release without this functionality since it will be a debugging nightmare if we were to hit any leader election bug post 0.8 deployment. I'm performance testing 0.8 right now, and found another bug that randomly fails produce requests because the leader election hasn't completed/started for a certain partition at a certain time. If we had this feature, that effort would be greatly simplified
        Hide
        Neha Narkhede added a comment -

        Swapnil, I think this JIRA is missing your patch. Please can you attach it ?

        Show
        Neha Narkhede added a comment - Swapnil, I think this JIRA is missing your patch. Please can you attach it ?
        Hide
        Swapnil Ghike added a comment -

        Have finished writing down all functionality. The tool accepts a list of log files and an optional list of topics, and it seems to be doing the merging part fine. Currently waiting for KAFKA-532 which I believe will add controller id to leaderAndIsr (this jira needs it for logging controller id in the partition state change log).

        The first patch might need a review to make sure that no additional logging is necessary and for any code refactoring.

        Show
        Swapnil Ghike added a comment - Have finished writing down all functionality. The tool accepts a list of log files and an optional list of topics, and it seems to be doing the merging part fine. Currently waiting for KAFKA-532 which I believe will add controller id to leaderAndIsr (this jira needs it for logging controller id in the partition state change log). The first patch might need a review to make sure that no additional logging is necessary and for any code refactoring.
        Hide
        Prashanth Menon added a comment -

        Hi guys - still haven't had an opportunity to look into this. It's probably best if I stop blocking this and let someone else take it

        Show
        Prashanth Menon added a comment - Hi guys - still haven't had an opportunity to look into this. It's probably best if I stop blocking this and let someone else take it
        Hide
        Swapnil Ghike added a comment -

        Hi Prashanth,

        How far along are you in this jira? If you are busy, I can take a look at it. Also if you have some unfinished patch, I can take it from there too.

        Swapnil

        Show
        Swapnil Ghike added a comment - Hi Prashanth, How far along are you in this jira? If you are busy, I can take a look at it. Also if you have some unfinished patch, I can take it from there too. Swapnil
        Hide
        Neha Narkhede added a comment -

        Prashanth, how is the work on this going, do you need any help here ?

        Show
        Neha Narkhede added a comment - Prashanth, how is the work on this going, do you need any help here ?
        Hide
        Prashanth Menon added a comment -

        I'm all onboard with reusing Kafka itself to publish these state-change logs to a topic (I believe RabbitMQ does something similar) but we should probably wait until 0.8 is stable. Hiding it behind an event-bus-like interface leaves room to implement such real-time visualization, though. Perhaps we can start with the event-bus interface and an implmentation that can be mixed in with the Logging trait to start. The implementation will just use the local logging functionality and we'll create a tool that functions as mentioned above to perform the merge. Periodic rsyncs, though tedious, will be required until we're confident enough to start reusing a Kafka topic for these state-change events.

        Show
        Prashanth Menon added a comment - I'm all onboard with reusing Kafka itself to publish these state-change logs to a topic (I believe RabbitMQ does something similar) but we should probably wait until 0.8 is stable. Hiding it behind an event-bus-like interface leaves room to implement such real-time visualization, though. Perhaps we can start with the event-bus interface and an implmentation that can be mixed in with the Logging trait to start. The implementation will just use the local logging functionality and we'll create a tool that functions as mentioned above to perform the merge. Periodic rsyncs, though tedious, will be required until we're confident enough to start reusing a Kafka topic for these state-change events.
        Hide
        Neha Narkhede added a comment -

        log4j appender providing a real time visualization seems like a pretty cool idea. One of the concerns of using the same cluster for this data is that this timeline is meant for debugging purposes when things go wrong (due to some bug). When that happens, it probably even means this state-change topic is not reliable. Since 0.8 is still immature, I would imagine an offline tool might serve the purpose. Thoughts ?

        Show
        Neha Narkhede added a comment - log4j appender providing a real time visualization seems like a pretty cool idea. One of the concerns of using the same cluster for this data is that this timeline is meant for debugging purposes when things go wrong (due to some bug). When that happens, it probably even means this state-change topic is not reliable. Since 0.8 is still immature, I would imagine an offline tool might serve the purpose. Thoughts ?
        Hide
        Joel Koshy added a comment -

        I think ideally, we should have an internal event-bus interface, with log4j
        being one of the output implementations. i.e., instead of directly logging
        info/error/etc. it would be say, eventBus.info/error... In the future, we
        could plug in a kafka-based event collector - i.e., produce these events to
        a Kafka topic which would facilitate real-time event collection to implement
        say, a real-time visualization of what's going on in the cluster (as opposed
        to intermittent rsyncs of kafka log4j logs). Since 0.8 is still
        under-development (and even if it is stable) I'm not sure it's a good idea
        to make it self-hosting. i.e., it may be better to send events in such an
        implementation to a separate Kafka cluster. OTOH we could use the same
        cluster given that the log4j implementation's events are available for backup.

        Show
        Joel Koshy added a comment - I think ideally, we should have an internal event-bus interface, with log4j being one of the output implementations. i.e., instead of directly logging info/error/etc. it would be say, eventBus.info/error... In the future, we could plug in a kafka-based event collector - i.e., produce these events to a Kafka topic which would facilitate real-time event collection to implement say, a real-time visualization of what's going on in the cluster (as opposed to intermittent rsyncs of kafka log4j logs). Since 0.8 is still under-development (and even if it is stable) I'm not sure it's a good idea to make it self-hosting. i.e., it may be better to send events in such an implementation to a separate Kafka cluster. OTOH we could use the same cluster given that the log4j implementation's events are available for backup.
        Hide
        Prashanth Menon added a comment -

        Hi all,

        Wow, a lot has changed in the few weeks I wasn't concentrating on Kafka. I think I've managed to grok most of the modifications, so what I'd like to do for this ticket is:

        1. Add the logging capability to PartitionStateMachine and KafkaApis.
        2. Add utility (in contrib?) to "merge" multiple such log files from multiple brokers to form a coherent timeline. This relies on the timestamp being in the log file and broker's clocks being in-sync. The format of the lines in the log file need to be in some predictable and parse-able format to make things easier too.

        Show
        Prashanth Menon added a comment - Hi all, Wow, a lot has changed in the few weeks I wasn't concentrating on Kafka. I think I've managed to grok most of the modifications, so what I'd like to do for this ticket is: 1. Add the logging capability to PartitionStateMachine and KafkaApis. 2. Add utility (in contrib?) to "merge" multiple such log files from multiple brokers to form a coherent timeline. This relies on the timestamp being in the log file and broker's clocks being in-sync. The format of the lines in the log file need to be in some predictable and parse-able format to make things easier too.
        Hide
        Prashanth Menon added a comment -

        Hey, to be honest I haven't looked into yet. I'd like to spend some time this weekend to get something out, if that's okay.

        Show
        Prashanth Menon added a comment - Hey, to be honest I haven't looked into yet. I'd like to spend some time this weekend to get something out, if that's okay.
        Hide
        Neha Narkhede added a comment -

        Hey Prashanth, just checking in to see how this is coming along..

        Show
        Neha Narkhede added a comment - Hey Prashanth, just checking in to see how this is coming along..
        Hide
        Neha Narkhede added a comment -

        Thanks for volunteering Prashanth ! You can take it, I'm almost done with KAFKA-499 and can check it in by tomorrow. That unblocks work on this JIRA.

        Show
        Neha Narkhede added a comment - Thanks for volunteering Prashanth ! You can take it, I'm almost done with KAFKA-499 and can check it in by tomorrow. That unblocks work on this JIRA.
        Hide
        Prashanth Menon added a comment -

        Mind if I take a stab at this, or did you want to take it on?

        Show
        Prashanth Menon added a comment - Mind if I take a stab at this, or did you want to take it on?

          People

          • Assignee:
            Swapnil Ghike
            Reporter:
            Neha Narkhede
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 96h
              96h
              Remaining:
              Remaining Estimate - 96h
              96h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development