Thanks for patch v3. Overall, its pretty good, a few review comments -
Let's standardize on the format for printing a partition. The reason I suggested we follow [%s,%d] is because most of the code uses TopicPartition and that prints partition as [%s,%d]. Saw that you changed that to add a space and we also had started with that, but after finding that most of us tend to grep for "[%s,%d]" instead, we decided to remove that space
1.1 Let's include correlation id in the state change log messages. This will make it easier to trace the state change messages through all brokers
1.2 I think we should include another state change log message once the controller receives a response to the state change request. We can print correlation id here as well
1.3 In the log message for leader and isr request, I wonder if it's worth specifying if it is either become-leader or become-follower for that particular partition. Same can be done on the broker as well.Otherwise, it still remains a mystery whether the broker was asked to become a leader or a follower. Thoughts ?
It's good to add the controller id, should we also change the toString method to print it ?
3.1 Can we call the partitionStateChangeLogger just stateChangeLogger for simplicity ?
3.2 The kind of information you chose to put in the state change log is useful. Can we standardize it a bit, more like on the lines of what's in ControllerChannelManager. It will be nice to know which leader and isr request is being rejected (identified by correlation id). Also which controller with what epoch had sent it ? For example,
"Broker 1 rejected LeaderAndIsrRequest correlation id 123 from controller 2 epoch 3 for partition [foo,1] as current leader epoch 5 is >= request leader epoch 4"
3.3 Let's fix the [%s, %d] for printing the partition in makeFollower()
We should probably have a "received request" state change messages on the brokers. Probably this can go in KafkaApis. Something like -
"Broker 1 received LeaderAndIsrRequest correlation id 123 from controller 2 epoch 3 for partition [foo,1]"
This can be followed by a "handling request" state change message, if there is one and then "handled request" on the broker. Finally, there will be a "completed request" on the controller. This will complete the lifecycle of a state change request.
Can we standardize the log messages here as well ?
"Controller 1 epoch 3 elected leader 4 for partition [foo,1]"
6.1 Like I mentioned above, let's include "handling request" in the state change log as well
6.2 Typo -> follwer
6.3 Let's also include the error case when the broker drops the leader and isr request sent by a stale controller epoch inside becomeLeaderOrFollower()
6.4 Let's standardize on the log message format here as well
Let's standardize on the log message format here as well
8.1 Can we rename this to just StateChangeLogMerger?
8.2 We don't use camel case in command line options for our tools. I don't think we've quite standardized on one thing, but we use either '-' or '.' to separate the words in command line options.
8.3 It's unclear from the description of the command line option what the input to this tool is. Is it a directory that contains state change logs from all servers or is it a csv that points to a list of state change logs ? I think the former will be easier to work with since there could be multiple state change logs per broker (it is usually configured for daily rolling).
8.4 It does seem useful to provide an optional time range to this tool. That way it doesn't have to unnecessarily merge data from files that don't fit in the query time range. Sometimes, you know the star
t time, sometimes the end time, sometimes none or both. All of these are useful while troubleshooting
8.5 It seems like this tool should also take the topic partition as input. Most times, I see us querying this tool for a particular partition. Here it is useful to take in topic and partition as separate
options. If you specify only topic but not the partition, the tool should get data for all partitions for that topic. It's good that the tool can handle all topic partitions as well.
8.6 I am not sure why the second map in partitionsMergedLog is keyed on Date ? Also, I have the same suggestion I had earlier to scale this tool. The main reason is that even if a single state change log has fewer entries, I see us using this tool across multiday history of state changes on a Kafka cluster. Typically, when you hit a hard-to-debug issue in production, the last thing you want is for your awesome tool to run out of memory since that is the best option you have to troubleshoot the issue at hand. So I suggest we look into improving the merging process in this tool.
Currently, the merging algorithm requires streaming all the entries from each broker's state change log in memory. This runs a risk of the tool running out of memory. One way to mediate this is to basically do a n-way merge from m input files. The algorithm is something like this -
8.6.1 Read a line (that matches the topic/partition regex and date range, if there is one) from every input file in a priority queue
8.6.2 Take the line from the file with the earliest date and add it to an output buffer
8.6.3 Add another line from the file in the earlier step in the priority queue
8.6.4 Flush the output buffer every n entries
Here n can be set to some reasonable number given the memory usage of a typical line in the state change log (Since we wrote this, we should be able to estimate this)
What is the reason to defining this in a separate class ? It seems like a wrapper over the basic getLogger statement.
10. Few other suggestions on the formatted output of the state change log -
Let's remove "[Partition state machine on Controller 0]", "[Replica Manager on Broker 0]" part of the log statement. Also remove the (partitionStateChangeLogger) at the end. We also don't need to log if its INFO or DEBUG.