Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0, 2.0.3-alpha
    • Component/s: metrics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Updated description based on feedback:

      We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes.
      The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

      1. HADOOP-9090.2.patch
        40 kB
        Mostafa Elhemali
      2. HADOOP-9090.branch-1.patch
        14 kB
        Suresh Srinivas
      3. HADOOP-9090.branch-1.patch
        14 kB
        Mostafa Elhemali
      4. HADOOP-9090.justEnhanceDefaultImpl.2.patch
        14 kB
        Mostafa Elhemali
      5. HADOOP-9090.justEnhanceDefaultImpl.3.patch
        18 kB
        Mostafa Elhemali
      6. HADOOP-9090.justEnhanceDefaultImpl.4.patch
        14 kB
        Mostafa Elhemali
      7. HADOOP-9090.justEnhanceDefaultImpl.5.patch
        19 kB
        Mostafa Elhemali
      8. HADOOP-9090.justEnhanceDefaultImpl.6.patch
        20 kB
        Mostafa Elhemali
      9. HADOOP-9090.justEnhanceDefaultImpl.7.patch
        20 kB
        Mostafa Elhemali
      10. HADOOP-9090.justEnhanceDefaultImpl.8.patch
        18 kB
        Mostafa Elhemali
      11. HADOOP-9090.justEnhanceDefaultImpl.9.patch
        18 kB
        Mostafa Elhemali
      12. HADOOP-9090.justEnhanceDefaultImpl.patch
        9 kB
        Mostafa Elhemali
      13. HADOOP-9090.patch
        40 kB
        Mostafa Elhemali

        Activity

        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12554808/HADOOP-9090.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1819//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1819//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1819//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554808/HADOOP-9090.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1819//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1819//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1819//console This message is automatically generated.
        Hide
        Mostafa Elhemali added a comment -

        Patch with findbugs fixes.

        Show
        Mostafa Elhemali added a comment - Patch with findbugs fixes.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12554964/HADOOP-9090.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1825//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1825//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554964/HADOOP-9090.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1825//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1825//console This message is automatically generated.
        Hide
        Luke Lu added a comment -

        I can imagine that you would want to control publishing the metrics at the end of a process. What about a longer running process that also want to periodically publish its metrics? Wouldn't it be better to enhance the metrics system to expose an ondemand publish and wait (until the metrics are sent or timed out) method, so the same metrics system can work with all processes? long running or not?

        Show
        Luke Lu added a comment - I can imagine that you would want to control publishing the metrics at the end of a process. What about a longer running process that also want to periodically publish its metrics? Wouldn't it be better to enhance the metrics system to expose an ondemand publish and wait (until the metrics are sent or timed out) method, so the same metrics system can work with all processes? long running or not?
        Hide
        Mostafa Elhemali added a comment -

        Thanks for the suggestion Luke! I've attached an alternative patch doing what you're suggesting: adding a publishMetricsNow() method that synchronously publishes all metrics on the same thread in the default implementation.

        The alternative patch is much simpler, and honestly I'm having a "why didn't I think of that?" moments right now. If people are OK with the change of the default implementation (and the addition of the new interface method) and I'm not missing any race conditions (I'll keep looking) then I think the simpler patch would work just fine for my purposes.

        Show
        Mostafa Elhemali added a comment - Thanks for the suggestion Luke! I've attached an alternative patch doing what you're suggesting: adding a publishMetricsNow() method that synchronously publishes all metrics on the same thread in the default implementation. The alternative patch is much simpler, and honestly I'm having a "why didn't I think of that?" moments right now. If people are OK with the change of the default implementation (and the addition of the new interface method) and I'm not missing any race conditions (I'll keep looking) then I think the simpler patch would work just fine for my purposes.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12555046/HADOOP-9090.justEnhanceDefaultImpl.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1828//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1828//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555046/HADOOP-9090.justEnhanceDefaultImpl.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1828//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1828//console This message is automatically generated.
        Hide
        Luke Lu added a comment -

        The new patch is indeed much simpler I don't think the sinkLock is needed though, as publishMetrics is already synchronized. Another thing is error handling with publishMetricsNow. What if underlying sink hangs (due to network issues)? Do you want the process to potentially hang forever if the final metrics are not published?

        Show
        Luke Lu added a comment - The new patch is indeed much simpler I don't think the sinkLock is needed though, as publishMetrics is already synchronized. Another thing is error handling with publishMetricsNow. What if underlying sink hangs (due to network issues)? Do you want the process to potentially hang forever if the final metrics are not published?
        Hide
        Mostafa Elhemali added a comment -

        Thanks Luke. I've attached a new patch that guards against sinks taking too long (10 seconds by default) when consuming any single record in the publishMetricsNow() call.

        I do need to synchronize the publishing of metrics because publishNow() competes with the queue consumption thread for the sink. I've removed sinkLock though and just made the inner method synchronized (on the sink adapter itself) since no one else synchronizes on it so we there should be no contention.

        Show
        Mostafa Elhemali added a comment - Thanks Luke. I've attached a new patch that guards against sinks taking too long (10 seconds by default) when consuming any single record in the publishMetricsNow() call. I do need to synchronize the publishing of metrics because publishNow() competes with the queue consumption thread for the sink. I've removed sinkLock though and just made the inner method synchronized (on the sink adapter itself) since no one else synchronizes on it so we there should be no contention.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12555070/HADOOP-9090.justEnhanceDefaultImpl.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1832//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1832//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555070/HADOOP-9090.justEnhanceDefaultImpl.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1832//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1832//console This message is automatically generated.
        Hide
        Luke Lu added a comment -

        I think I understand what you're doing now. I'm concerned about creating a thread for each record in the publishMetricsNow and the usage of thread interrupt/stop, which seems heavy handed and doesn't guaranteed to work either (setDaemon(true) on the worker thread would help though). How about make the putMetricsImediate call the regular putMetrics and wait for the publish to complete? The signalling is slightly tricky in this case. I'm sure you'll figure this out, though

        Show
        Luke Lu added a comment - I think I understand what you're doing now. I'm concerned about creating a thread for each record in the publishMetricsNow and the usage of thread interrupt/stop, which seems heavy handed and doesn't guaranteed to work either (setDaemon(true) on the worker thread would help though). How about make the putMetricsImediate call the regular putMetrics and wait for the publish to complete? The signalling is slightly tricky in this case. I'm sure you'll figure this out, though
        Hide
        Mostafa Elhemali added a comment -

        Thanks Luke. It's a fundamentally hard problem: if we assume that putMetrics() from arbitrary sinks can hang, then we'll have to interrupt the thread that doing the work and hope that unsticks it (that's how thread pools cancel work for example). Having said that, I got uneasy about using stop(), and I agree with your comment that I was creating too many threads willy-nilly. The new patch thus does two things:
        1. Creates a single thread (if needed) for on-demand publishing per sink, and reuses that for subsequent publishing. This should cut down considerably on how many threads are created.
        2. If it sees a sink timeout, it interrupts the worker thread but doesn't stop() it (joins it and hangs with it if needed).

        I personally think that's the most reliable thing we can do. One alternative as you say is to mark the worker thread as a daemon, so we don't have to join it and hang with it if it hangs, but just abandon it. The problem there is that sinks may be doing things to external systems that will leave them in inconsistent states if the process just shuts down while they're still working with no warning. Let me know what you think.

        Show
        Mostafa Elhemali added a comment - Thanks Luke. It's a fundamentally hard problem: if we assume that putMetrics() from arbitrary sinks can hang, then we'll have to interrupt the thread that doing the work and hope that unsticks it (that's how thread pools cancel work for example). Having said that, I got uneasy about using stop(), and I agree with your comment that I was creating too many threads willy-nilly. The new patch thus does two things: 1. Creates a single thread (if needed) for on-demand publishing per sink, and reuses that for subsequent publishing. This should cut down considerably on how many threads are created. 2. If it sees a sink timeout, it interrupts the worker thread but doesn't stop() it (joins it and hangs with it if needed). I personally think that's the most reliable thing we can do. One alternative as you say is to mark the worker thread as a daemon, so we don't have to join it and hang with it if it hangs, but just abandon it. The problem there is that sinks may be doing things to external systems that will leave them in inconsistent states if the process just shuts down while they're still working with no warning. Let me know what you think.
        Hide
        Luke Lu added a comment -

        I don't think you can control what happens to external systems, which should handle arbitrary connection errors etc by unexpected client (even router/firewall) shutdown. Another problem of the new patch is that you're creating a latch and a semaphore per record (vs per MetricsBuffer) and there can be hundreds (up to a few thousands) of records per put. If the sink hangs, you'll be recreating new thread/latch/semaphore per record and the user perceived timeout would be configured timeout * number of records. Another issue is that hanging can happen in sink.flush as well.

        Why not do the simple notification in the existing code like the following (untested sketch)?:

        boolean oobPut;
        
        // illustration only, should be in the ctor after retry* variables are defined
        final long OOB_PUT_TIMEOUT = retryDelay * Math.pow(retryBackoff, retryCount) * 1000;
        
        synchronized void putMetricsImmediate(MetricsBuffer mb) {
          if (!oobPut) {
            oobPut = true;
            if (queue.enqueue(buffer)) {
              wait(OOB_PUT_TIMEOUT);
              oobPut = false;
            } // otherwise queue is full due to sink issues anyway.
          } else { // another oobPut in progress
            wait(OOB_PUT_TIMEOUT);
            oobPut = false; // just in case
          }
        }
        
        // after queue.consumeAll(this); in publishMetricsFromQueue (needs to be synchronized now)
        if (oobPut) {
          notifyAll();
        }
        

        Now you get all the retry/timeout logic for free

        Show
        Luke Lu added a comment - I don't think you can control what happens to external systems, which should handle arbitrary connection errors etc by unexpected client (even router/firewall) shutdown. Another problem of the new patch is that you're creating a latch and a semaphore per record (vs per MetricsBuffer) and there can be hundreds (up to a few thousands) of records per put. If the sink hangs, you'll be recreating new thread/latch/semaphore per record and the user perceived timeout would be configured timeout * number of records. Another issue is that hanging can happen in sink.flush as well. Why not do the simple notification in the existing code like the following (untested sketch)?: boolean oobPut; // illustration only, should be in the ctor after retry* variables are defined final long OOB_PUT_TIMEOUT = retryDelay * Math .pow(retryBackoff, retryCount) * 1000; synchronized void putMetricsImmediate(MetricsBuffer mb) { if (!oobPut) { oobPut = true ; if (queue.enqueue(buffer)) { wait(OOB_PUT_TIMEOUT); oobPut = false ; } // otherwise queue is full due to sink issues anyway. } else { // another oobPut in progress wait(OOB_PUT_TIMEOUT); oobPut = false ; // just in case } } // after queue.consumeAll( this ); in publishMetricsFromQueue (needs to be synchronized now) if (oobPut) { notifyAll(); } Now you get all the retry/timeout logic for free
        Hide
        Mostafa Elhemali added a comment -

        (I highly appreciate the responsive feedback I'm getting from you Luke.)

        Thanks - you raise good points. My fear was that do we want the immediate sinks to be queued up or not, and do we want to leave the queue thread hanging if the sink times out or not. I think though I was swayed to your way of thinking by two things: a) the flush() problem you point out and b) that my previous patch would've left sinks possible called by multiple threads instead of just the queue thread which can be bad.

        I've implemented the spirit of what you said. I opted though to have the putMetricsImmediate() wait for its specific MetricsBuffer since I figured the way you had it may end up in race conditions if multiple threads are calling publishMetricsNow() at the same time. Let me know what you think - thanks!

        Show
        Mostafa Elhemali added a comment - (I highly appreciate the responsive feedback I'm getting from you Luke.) Thanks - you raise good points. My fear was that do we want the immediate sinks to be queued up or not, and do we want to leave the queue thread hanging if the sink times out or not. I think though I was swayed to your way of thinking by two things: a) the flush() problem you point out and b) that my previous patch would've left sinks possible called by multiple threads instead of just the queue thread which can be bad. I've implemented the spirit of what you said. I opted though to have the putMetricsImmediate() wait for its specific MetricsBuffer since I figured the way you had it may end up in race conditions if multiple threads are calling publishMetricsNow() at the same time. Let me know what you think - thanks!
        Hide
        Ravi Prakash added a comment -

        This could be very useful. Thanks for taking this up Mostafa.

        Minor nit. In MetricsSystem.java:

        public abstract void publishMetricsNow();
        

        IMHO we shouldn't put that method that high in the heirarchy. How would implementations of MetricsSystem not concerned with real-time implement this method?

        Does the description of this JIRA need an update? The classes aren't abstract after your patch.

        Otherwise code looks good to me.

        Show
        Ravi Prakash added a comment - This could be very useful. Thanks for taking this up Mostafa. Minor nit. In MetricsSystem.java: public abstract void publishMetricsNow(); IMHO we shouldn't put that method that high in the heirarchy. How would implementations of MetricsSystem not concerned with real-time implement this method? Does the description of this JIRA need an update? The classes aren't abstract after your patch. Otherwise code looks good to me.
        Hide
        Mostafa Elhemali added a comment -

        Thanks Ravi for the feedback - I knew it was a bit controversial to put this method in the MetricsSystem interface and require it from other systems, but I figured it's the only way for outside customer to really take advantage of this since MetricsSystemImpl is not intended for out-of-Hadoop consumption. Having said that, my immediate need would be met without putting it in the interface so I took that out for now (we can always add it in another explicit JIRA if needed).

        I've also added a new multi-threaded test in the new patch to make sure everything is alright there.

        Show
        Mostafa Elhemali added a comment - Thanks Ravi for the feedback - I knew it was a bit controversial to put this method in the MetricsSystem interface and require it from other systems, but I figured it's the only way for outside customer to really take advantage of this since MetricsSystemImpl is not intended for out-of-Hadoop consumption. Having said that, my immediate need would be met without putting it in the interface so I took that out for now (we can always add it in another explicit JIRA if needed). I've also added a new multi-threaded test in the new patch to make sure everything is alright there.
        Hide
        Luke Lu added a comment -

        Adding a publishMetricsNow method to the MetricsSystem is reasonable, as the interface is considered Evolving and the requirement has universal utility (I actually thought about adding it in the beginning but there was no such requirement then).

        I figured the way you had it may end up in race conditions if multiple threads are calling publishMetricsNow() at the same time.

        The sketch was meant to be simple and the race is considered harmless: it's ok to potentially exit before one of the metrics buffer that's almost the same time with the last one is flushed. OTOH, if you want to wait for individual metrics buffer you can do the following without a new wrapper:

        // in putMetricsImediate
        synchronized(buffer) {
          buffer.wait(oobTimeout);
        }
        
        // in consume
        synchronized(buffer) {
          buffer.notify();
        }
        
        Show
        Luke Lu added a comment - Adding a publishMetricsNow method to the MetricsSystem is reasonable, as the interface is considered Evolving and the requirement has universal utility (I actually thought about adding it in the beginning but there was no such requirement then). I figured the way you had it may end up in race conditions if multiple threads are calling publishMetricsNow() at the same time. The sketch was meant to be simple and the race is considered harmless: it's ok to potentially exit before one of the metrics buffer that's almost the same time with the last one is flushed. OTOH, if you want to wait for individual metrics buffer you can do the following without a new wrapper: // in putMetricsImediate synchronized (buffer) { buffer.wait(oobTimeout); } // in consume synchronized (buffer) { buffer.notify(); }
        Hide
        Mostafa Elhemali added a comment -

        Point about how to synchronize putMetricsImmediate
        Thanks Luke. Yeah I considered waiting on the buffer itself before creating the wrapper, but there are a couple of reasons I didn't end up doing that:
        1. (Main reason) The sink doesn't own the buffer object, so it doesn't know who else is waiting on it or notifying it. Seems wrong to presume to wait on it.
        2. Object.wait(timeout) doesn't return the result of the wait, so I wouldn't know if that succeeded or failed without additional complex logic.

        As for the race being harmless: I'm not sure it's that harmless. For all we know the buffers that were just processed from the queue were from ages ago, and the values in the new buffer are completely different. I'd much rather play it safe and give it an honest attempt to publish what I've just been given.

        So, for the reasons above I'd rather go with the wrapper despite the added code complexity.

        Point about putting the method in the interface
        OK since me & Luke are two votes to put the method in the interface, and Luke made a good point about the interface being evolving, I'll put the method back into the interface in a subsequent patch unless anyone else objects (or Ravi presses the point with other reasons). Thanks all.

        Show
        Mostafa Elhemali added a comment - Point about how to synchronize putMetricsImmediate Thanks Luke. Yeah I considered waiting on the buffer itself before creating the wrapper, but there are a couple of reasons I didn't end up doing that: 1. (Main reason) The sink doesn't own the buffer object, so it doesn't know who else is waiting on it or notifying it. Seems wrong to presume to wait on it. 2. Object.wait(timeout) doesn't return the result of the wait, so I wouldn't know if that succeeded or failed without additional complex logic. As for the race being harmless: I'm not sure it's that harmless. For all we know the buffers that were just processed from the queue were from ages ago, and the values in the new buffer are completely different. I'd much rather play it safe and give it an honest attempt to publish what I've just been given. So, for the reasons above I'd rather go with the wrapper despite the added code complexity. Point about putting the method in the interface OK since me & Luke are two votes to put the method in the interface, and Luke made a good point about the interface being evolving, I'll put the method back into the interface in a subsequent patch unless anyone else objects (or Ravi presses the point with other reasons). Thanks all.
        Hide
        Luke Lu added a comment -

        Good points, Mostafa. I should know that MetricsBuffer though "immutable" can be shared. I guess it was a kneejerk reaction java verbosity

        Anyway the new logic looks solid to me. Thanks!

        Show
        Luke Lu added a comment - Good points, Mostafa. I should know that MetricsBuffer though "immutable" can be shared. I guess it was a kneejerk reaction java verbosity Anyway the new logic looks solid to me. Thanks!
        Hide
        Ravi Prakash added a comment -

        Thanks Luke and Mostafa

        and the requirement has universal utility

        Agreed. But it also places restrictions on how scalable the system can be.

        I'm flexible with where you want to introduce that method. Even then, I would like the behavior of that method javadoc'ed explicitly stating what the expectation is if a MetricsSystem cannot provide real-time guarantees.

        Show
        Ravi Prakash added a comment - Thanks Luke and Mostafa and the requirement has universal utility Agreed. But it also places restrictions on how scalable the system can be. I'm flexible with where you want to introduce that method. Even then, I would like the behavior of that method javadoc'ed explicitly stating what the expectation is if a MetricsSystem cannot provide real-time guarantees.
        Hide
        Mostafa Elhemali added a comment -

        That's a fair request Ravi. I took a shot at documenting that in Javadoc on the method - does the wording look reasonable?

        /**

        • Requests an immediate publish of all metrics from sources to sinks.
        • This is a "soft" request: the expectation is that a best effort will be
        • done to synchronously snapshot the metrics from all the sources and put
        • them in all the sinks (including flushing the sinks) before returning to
        • the caller. If this can't be accomplished in reasonable time it's OK to
        • return to the caller before everything is done.
          */
        Show
        Mostafa Elhemali added a comment - That's a fair request Ravi. I took a shot at documenting that in Javadoc on the method - does the wording look reasonable? /** Requests an immediate publish of all metrics from sources to sinks. This is a "soft" request: the expectation is that a best effort will be done to synchronously snapshot the metrics from all the sources and put them in all the sinks (including flushing the sinks) before returning to the caller. If this can't be accomplished in reasonable time it's OK to return to the caller before everything is done. */
        Hide
        Ravi Prakash added a comment -

        Thanks Mostafa! That works for me. +1 from my side.

        Show
        Ravi Prakash added a comment - Thanks Mostafa! That works for me. +1 from my side.
        Hide
        Luke Lu added a comment -

        Mostafa, I'm still a little bothered by the fact that a wrapper object is created for per publish per sink in normal cases. Since MetricsBuffer is trivially decoratable, how about creating a WaitalbeMetricsBuffer for OOB case only?

        Show
        Luke Lu added a comment - Mostafa, I'm still a little bothered by the fact that a wrapper object is created for per publish per sink in normal cases. Since MetricsBuffer is trivially decoratable, how about creating a WaitalbeMetricsBuffer for OOB case only?
        Hide
        Mostafa Elhemali added a comment -

        Luke: is the code in the new patch what you have in mind?

        Show
        Mostafa Elhemali added a comment - Luke: is the code in the new patch what you have in mind?
        Hide
        Luke Lu added a comment -

        Why don't you just make WaitableMetricsBuffer extends MetricsBuffer? like this:

        static WaitableMetricsBuffer extends MetricsBuffer {
          WaitableMetricsBuffer(MetricsBuffer buffer) {
            super(buffer);
          }
          ...
        }
        

        Less change to existing code...

        Show
        Luke Lu added a comment - Why don't you just make WaitableMetricsBuffer extends MetricsBuffer? like this: static WaitableMetricsBuffer extends MetricsBuffer { WaitableMetricsBuffer(MetricsBuffer buffer) { super (buffer); } ... } Less change to existing code...
        Hide
        Mostafa Elhemali added a comment -

        Like this?

        Show
        Mostafa Elhemali added a comment - Like this?
        Hide
        Luke Lu added a comment -

        Thanks Mostafa! +1 on v8 pending jenkins.

        Show
        Luke Lu added a comment - Thanks Mostafa! +1 on v8 pending jenkins.
        Hide
        Suresh Srinivas added a comment -

        I kicked off a pre-commit build, since Jenkins had not picked up the new patch.

        Show
        Suresh Srinivas added a comment - I kicked off a pre-commit build, since Jenkins had not picked up the new patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12555596/HADOOP-9090.justEnhanceDefaultImpl.8.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1836//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1836//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1836//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555596/HADOOP-9090.justEnhanceDefaultImpl.8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1836//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1836//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1836//console This message is automatically generated.
        Hide
        Mostafa Elhemali added a comment -

        Thanks Suresh. FindBugs is correct, and publishMetricsFromQueue() doesn't actually need to be synchronized (it was made that way in an earlier iteration, but current implementation doesn't need it).

        Attached a new patch with the synchronized keyword removed from that method.

        Show
        Mostafa Elhemali added a comment - Thanks Suresh. FindBugs is correct, and publishMetricsFromQueue() doesn't actually need to be synchronized (it was made that way in an earlier iteration, but current implementation doesn't need it). Attached a new patch with the synchronized keyword removed from that method.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12555692/HADOOP-9090.justEnhanceDefaultImpl.9.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1837//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1837//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555692/HADOOP-9090.justEnhanceDefaultImpl.9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1837//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1837//console This message is automatically generated.
        Hide
        Suresh Srinivas added a comment -

        I committed the patch to trunk and branch-2. Thank you Mostafa.

        Show
        Suresh Srinivas added a comment - I committed the patch to trunk and branch-2. Thank you Mostafa.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3085 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3085/)
        HADOOP-9090. Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538)

        Result = SUCCESS
        suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3085 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3085/ ) HADOOP-9090 . Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #56 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/56/)
        HADOOP-9090. Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538)

        Result = SUCCESS
        suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #56 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/56/ ) HADOOP-9090 . Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1246 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1246/)
        HADOOP-9090. Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538)

        Result = FAILURE
        suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1246 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1246/ ) HADOOP-9090 . Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1277 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1277/)
        HADOOP-9090. Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538)

        Result = SUCCESS
        suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1277 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1277/ ) HADOOP-9090 . Support on-demand publish of metrics. Contributed by Mostafa Elhemali. (Revision 1416538) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1416538 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/MetricsSystem.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestGangliaMetrics.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSystemImpl.java
        Hide
        Mostafa Elhemali added a comment -

        Attaching a patch for branch-1.

        Show
        Mostafa Elhemali added a comment - Attaching a patch for branch-1.
        Hide
        Suresh Srinivas added a comment -

        Small edits to the patch to ensure indentation matches with the original patch.

        +1.

        Luke you want to take quick look? I will commit this by the end of the day.

        Show
        Suresh Srinivas added a comment - Small edits to the patch to ensure indentation matches with the original patch. +1. Luke you want to take quick look? I will commit this by the end of the day.
        Hide
        Luke Lu added a comment -

        The latest branch-1 patch lgtm as well.

        Show
        Luke Lu added a comment - The latest branch-1 patch lgtm as well.
        Hide
        Suresh Srinivas added a comment -

        I committed this patch to branch-1 as well.

        Show
        Suresh Srinivas added a comment - I committed this patch to branch-1 as well.

          People

          • Assignee:
            Mostafa Elhemali
            Reporter:
            Mostafa Elhemali
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development