Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1239 New producer checklist
  3. KAFKA-1359

Add topic/broker metrics once new topic/broker is discovered

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.2.0
    • Component/s: producer
    • Labels:
      None

      Description

      Today some topic/broker level metrics are only added the first time such an event (record-retry, record-error, etc) happens. This has a potential issue for customized mbean reporter which needs to register all the sensors at the time the new broker/topic is discovered. It is better to add such metrics at the very beginning when new topic/brokers are discovered.

      1. KAFKA-1359_2014-04-10_10:11:40.patch
        41 kB
        Guozhang Wang
      2. KAFKA-1359_2014-04-11_14:20:45.patch
        41 kB
        Guozhang Wang
      3. KAFKA-1359_2014-04-16_09:53:55.patch
        19 kB
        Guozhang Wang
      4. KAFKA-1359_2014-04-21_11:45:53.patch
        17 kB
        Guozhang Wang
      5. KAFKA-1359_2014-04-21_16:29:02.patch
        58 kB
        Guozhang Wang
      6. KAFKA-1359_2014-04-21_16:30:57.patch
        58 kB
        Guozhang Wang
      7. KAFKA-1359_2014-04-21_16:33:22.patch
        58 kB
        Guozhang Wang
      8. KAFKA-1359.patch
        17 kB
        Guozhang Wang
      9. KAFKA-1359.patch
        17 kB
        Guozhang Wang
      10. KAFKA-1359.patch
        75 kB
        Guozhang Wang

        Activity

        Hide
        guozhang Guozhang Wang added a comment -

        Created reviewboard https://reviews.apache.org/r/20050/
        against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Created reviewboard https://reviews.apache.org/r/20050/ against branch origin/trunk
        Hide
        guozhang Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/20050/
        against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/20050/ against branch origin/trunk
        Hide
        guozhang Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/20050/
        against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/20050/ against branch origin/trunk
        Hide
        jkreps Jay Kreps added a comment -

        I'm +1 on the 1st patch. I think let's have a separate discussion on how to best model the hierarchical metrics. I'd prefer to avoid the global Metrics class if possible.

        Show
        jkreps Jay Kreps added a comment - I'm +1 on the 1st patch. I think let's have a separate discussion on how to best model the hierarchical metrics. I'd prefer to avoid the global Metrics class if possible.
        Hide
        guozhang Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/20050/
        against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/20050/ against branch origin/trunk
        Hide
        jkreps Jay Kreps added a comment -

        Applied.

        Show
        jkreps Jay Kreps added a comment - Applied.
        Hide
        guozhang Guozhang Wang added a comment -

        One follow-up observation is that we used System.nanoTime the the parameters in record() and measure(), while System.nanoTime itself does not necessarily represent the current time in nanoseconds, but just the cpu counter in the JVM in the granularity of nanoseconds. So when we use it to measure metadata age and compare the value with System.currentTimeMillis that will give us "undefined" behavior.

        In general, for our use case, I think we do not need to use System.nanoTime which is mainly for measuring elapsed time in very high accuracy, while we use the timestamp mainly for measuring rates, and window boundaries for samples. For those I think milliseconds are good enough. Jay Kreps What do you think?

        Show
        guozhang Guozhang Wang added a comment - One follow-up observation is that we used System.nanoTime the the parameters in record() and measure(), while System.nanoTime itself does not necessarily represent the current time in nanoseconds, but just the cpu counter in the JVM in the granularity of nanoseconds. So when we use it to measure metadata age and compare the value with System.currentTimeMillis that will give us "undefined" behavior. In general, for our use case, I think we do not need to use System.nanoTime which is mainly for measuring elapsed time in very high accuracy, while we use the timestamp mainly for measuring rates, and window boundaries for samples. For those I think milliseconds are good enough. Jay Kreps What do you think?
        Hide
        jkreps Jay Kreps added a comment -

        My motivation was that I was under the impression that nanoTime was faster and more appropriate for timing things which is essentially what we are doing. Both the rate and window boundary cases are about ellapsed time, right? That said, I had no idea that linux gave such an odd result.

        Show
        jkreps Jay Kreps added a comment - My motivation was that I was under the impression that nanoTime was faster and more appropriate for timing things which is essentially what we are doing. Both the rate and window boundary cases are about ellapsed time, right? That said, I had no idea that linux gave such an odd result.
        Hide
        guozhang Guozhang Wang added a comment -

        After reading around on the web it seems to me that nanoTime is more expensive than currentTimeMillis.

        For elapsed time measurement, currently the timestamp is only used in three places:

        1) Sample.isComplete: now - lastWindow >= config.timeWindowNs

        2) SampledState.purgeObsoleteSamples(): now - sample.lastWindow >= expireAge

        3) Possibly used on future SampledStat's update/combine functions, currently none of them used the "now" timestamp.

        With these use cases I think most measured time spans are large enough for milliseconds.

        Another issue I saw is that currently the record() call expose the timestamp input parameter, while the measure() call used the System.nanoTime internally. And we require these two granularities (one used internally and another passed in by users) be consistent. I would prefer to have both of them not exposed to users. If people agree I can go ahead and make these changes.

        Show
        guozhang Guozhang Wang added a comment - After reading around on the web it seems to me that nanoTime is more expensive than currentTimeMillis. For elapsed time measurement, currently the timestamp is only used in three places: 1) Sample.isComplete: now - lastWindow >= config.timeWindowNs 2) SampledState.purgeObsoleteSamples(): now - sample.lastWindow >= expireAge 3) Possibly used on future SampledStat's update/combine functions, currently none of them used the "now" timestamp. With these use cases I think most measured time spans are large enough for milliseconds. Another issue I saw is that currently the record() call expose the timestamp input parameter, while the measure() call used the System.nanoTime internally. And we require these two granularities (one used internally and another passed in by users) be consistent. I would prefer to have both of them not exposed to users. If people agree I can go ahead and make these changes.
        Hide
        guozhang Guozhang Wang added a comment -

        I did a small benchmark on my mac laptop and linux desktop:

         public class NanoVsMillis {
             public static void main(String[] args) {
                 int count = 10 * 1000 * 1000;
                 long[] times = new long[count];
                 long t1 = System.nanoTime();
                 for (int i = 0; i < count; i++) {
                     times[i] = System.nanoTime();
                 }
                 long t2 = System.nanoTime();
                 for (int i = 0; i < count; i++) {
                     times[i] = System.currentTimeMillis();
                 }
                 long t3 = System.nanoTime();
                 System.out.println("Total time nano : " + ((t2 - t1) / (1000 * 1000)) + "ms");
                 System.out.println("Total time millis: " + ((t3 - t2) / (1000 * 1000)) + "ms");
                 System.out.println("");
                 System.out.println("Avg time nano : " + ((t2 - t1) / count) + "ns");
                 System.out.println("Avg time millis: " + ((t3 - t2) / count) + "ns");
             }
         }
        

        On the mac laptop, since "the QueryPerformanceCounter/ QueryPerformanceFrequency API is available, nanoTime actually returns currentTimeMillis*10^6). ", the results are

        Total time nano : 366ms
        Total time millis: 354ms
        
        Avg time nano : 36ns
        Avg time millis: 35ns
        

        On the linux machine, the results are

        Total time nano : 520ms
        Total time millis: 470ms
        
        Avg time nano : 52ns
        Avg time millis: 47ns
        

        So I think nanoTime will be a bit more expensive than millis on the linux box. If millis' accuracy is sufficient for us, maybe we should switch from nano?

        Show
        guozhang Guozhang Wang added a comment - I did a small benchmark on my mac laptop and linux desktop: public class NanoVsMillis { public static void main( String [] args) { int count = 10 * 1000 * 1000; long [] times = new long [count]; long t1 = System .nanoTime(); for ( int i = 0; i < count; i++) { times[i] = System .nanoTime(); } long t2 = System .nanoTime(); for ( int i = 0; i < count; i++) { times[i] = System .currentTimeMillis(); } long t3 = System .nanoTime(); System .out.println( "Total time nano : " + ((t2 - t1) / (1000 * 1000)) + "ms" ); System .out.println( "Total time millis: " + ((t3 - t2) / (1000 * 1000)) + "ms" ); System .out.println(""); System .out.println( "Avg time nano : " + ((t2 - t1) / count) + "ns" ); System .out.println( "Avg time millis: " + ((t3 - t2) / count) + "ns" ); } } On the mac laptop, since "the QueryPerformanceCounter/ QueryPerformanceFrequency API is available, nanoTime actually returns currentTimeMillis*10^6). ", the results are Total time nano : 366ms Total time millis: 354ms Avg time nano : 36ns Avg time millis: 35ns On the linux machine, the results are Total time nano : 520ms Total time millis: 470ms Avg time nano : 52ns Avg time millis: 47ns So I think nanoTime will be a bit more expensive than millis on the linux box. If millis' accuracy is sufficient for us, maybe we should switch from nano?
        Hide
        jkreps Jay Kreps added a comment -

        Yeah I'm not opposed to switching.

        Show
        jkreps Jay Kreps added a comment - Yeah I'm not opposed to switching.
        Hide
        guozhang Guozhang Wang added a comment -

        Created reviewboard against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Created reviewboard against branch origin/trunk
        Hide
        guozhang Guozhang Wang added a comment -

        Created reviewboard https://reviews.apache.org/r/20529/
        against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Created reviewboard https://reviews.apache.org/r/20529/ against branch origin/trunk
        Hide
        guozhang Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/20529/
        against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/20529/ against branch origin/trunk
        Hide
        guozhang Guozhang Wang added a comment -

        Updated reviewboard against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Updated reviewboard against branch origin/trunk
        Hide
        guozhang Guozhang Wang added a comment -

        Updated reviewboard against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Updated reviewboard against branch origin/trunk
        Hide
        guozhang Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/20529/
        against branch origin/trunk

        Show
        guozhang Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/20529/ against branch origin/trunk
        Hide
        junrao Jun Rao added a comment -

        Thanks for the followup patch. +1 and committed to trunk.

        Show
        junrao Jun Rao added a comment - Thanks for the followup patch. +1 and committed to trunk.

          People

          • Assignee:
            guozhang Guozhang Wang
            Reporter:
            guozhang Guozhang Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development