This patch has a bunch of refactoring changes and a couple of new additions.
Addressing Jun's comments:
These are all great catches! Thanks for being so thorough.
60. By default, metrics-core will return an existing metric object of the same name using a getOrCreate() like functionality. As discussed offline, we should fail the clients that use an already registered clientId name. We will need to create two objects thaty contain hashmaps to record the existing producer and consumer clientIds and methods to throw an exception if a client attempts to use an existing clientId. I worked on this change a bit, but it breaks a lot of our unit tests (about half) and the refactoring will take some time. Hence, I think it will be better if I submit a patch for all other changes and create another patch for this issue under this jira. Until then we can keep this jira open.
61. For recording stats about all topics, I am now using a string "All.Topics". Since '.' is not allowed in the legal character set for topic names, this will differentiate from a topic named AllTopics.
62. Yes, we should validate groupId. Added the functionality and a unit test. It has the same validation rules as ClientId.
63. A metric name is something like (clientId + topic + some string) and this entire string is limited by fillename size. We already allow topic name to be at most 255 bytes long. We could fix max lengths for each of clientId, groupId, topic name so that the metric name never exceeds filename size. But those lengths will be quite arbitrary, perhaps we should skip the check on the length of clientId and groupId.
64. Removed brokerInfo from the clientId used to instantiate FetchRequestBuilder.
1. Moved validation of clientId at the end of instantiation of ProducerConfig and ConsumerConfig.
- Created static objects ProducerConfig and ConsumerConfig which contain a validate() method.
2. Created global *Registry objects in which each high level Producer and Consumer can register their *stats objects.
- These objects are registered in the static object only once using utils.Pool.getAndMaybePut functionality.
- This will remove the need to pass *stats objects around the code in constructors (I thought having the metrics objects right up in the constructors was a bit intrusive, since one doesn't quite always think about the monitoring mechanism while instantiating various modules of the program, for example while unit testing.)
- Instead of the constructor, each concerned class obtains the *Stats objects from the global registry object.
- This cleans up any metrics objects created in the unit tests.
- Special mention: The producer constructors are back to the old themselves. With clientId validation moved to *Config objects, the intermediate Producer constructor that merely separated the parameters of a quadruplet is gone.
3. Created separate files
- for ProducerStats, ProducerTopicStats, ProducerRequestStats in kafka.producer package and for FetchRequestAndResponseStats in kafka.consumer package. Thought it was appropriate given that we already had ConsumerTopicStats in a separate file, and since the code for metrics had increased in size due to addition of Registry and Aggregated objects. Added comments.
- for objects Topic, ClientId and GroupId in kafka.utils package.
- to move the helper case classes ClientIdAndTopic, ClientIdAndBroker to kafka.common package.
4. Renamed a few variables to easier names (anyOldName to "metricId" change).
1. Added two objects to aggregate metrics recorded by SyncProducers and SimpleConsumers at the high level Producer and Consumer.
- For this, changed KafkaTimer to accept a list of Timers. Typically we will pass a specificTimer and a globalTimer to this KafkaTimer class. Created a new KafkaHistogram in a similar way.
2. Validation of groupId.
1. Initializing the aggregator metrics with default values: For example, let's say that a syncProducer could be created (which will register a ProducerRequestStats mbean for this syncProducer). However, if no request is sent by this syncProducer then the absense of its data is not reflected in the aggregator histogram. For instance, the min requestSize for the syncProducer that never sent a request will be 0, but this won't be accurately represented in the aggregator histogram. Thus, we need to understand that if the request count of a syncProducer is 0, then its data will not be accurately reflected in the aggregator histogram.
The question is whether it is possible to inform the aggregator histogram of some default values without increasing the request count of any syncProducer or the aggregated stats.
Further proposed changes:
Another patch under this jira to address comment 60 by Jun.