Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13677

All Metrics Gauges should be unregistered by components that registered them

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 8.3
    • metrics
    • None

    Description

      The life cycle of Metrics producers are managed by the core (mostly). So, if the lifecycle of the object is different from that of the core itself, these objects will never be unregistered from the metrics registry. This will lead to memory leaks

      Attachments

        1. SOLR-13677.patch
          124 kB
          Andrzej Bialecki

        Issue Links

          Activity

            Commit 3ce75aac49c79a023a9f1519badfe769e6a8f797 in lucene-solr's branch refs/heads/jira/SOLR-13677 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3ce75aa ]

            SOLR-13677: initial commit

            jira-bot ASF subversion and git services added a comment - Commit 3ce75aac49c79a023a9f1519badfe769e6a8f797 in lucene-solr's branch refs/heads/jira/ SOLR-13677 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3ce75aa ] SOLR-13677 : initial commit
            janhoy Jan Høydahl added a comment -

            Noble, is there a reason why you push your feature branch to Apache GIT instead of keeping it in your own fork and do a PR when it’s ready? Do you expect there to be collaboration? Reason I ask is that it adds some “noise” to the lists for every push, merge etc.

            janhoy Jan Høydahl added a comment - Noble, is there a reason why you push your feature branch to Apache GIT instead of keeping it in your own fork and do a PR when it’s ready? Do you expect there to be collaboration? Reason I ask is that it adds some “noise” to the lists for every push, merge etc.
            noble.paul Noble Paul added a comment -

            What we need is avoid notifications for commits to all the JIRA branches . Using a feature branch is good for collaboration. Every committer automatically had access to your branch

            noble.paul Noble Paul added a comment - What we need is avoid notifications for commits to all the JIRA branches . Using a feature branch is good for collaboration. Every committer automatically had access to your branch

            The current patch really complicates the API and the expectations on the implementors. I think we need a different strategy.

            How about the following:

            • in SolrMetricProducer instead of using AutoCloseable.close() simply add a different default method eg. unregisterMetrics(). Chances are that many of existing components already implement close() so the implementors will not be aware that they need to also unregister the component's metrics.
            • I don't think we should expose GaugeWrapper outside of SolrMetricManager, this is really an internal detail. Adding the method above should be sufficient to explicitly unregister the metrics - and all SolrInfoBean-s already know what metric names (and what tag) they registered, because the tag is passed in initializeMetrics and the names are recorded in registerMetricName.
            ab Andrzej Bialecki added a comment - The current patch really complicates the API and the expectations on the implementors. I think we need a different strategy. How about the following: in SolrMetricProducer instead of using AutoCloseable.close() simply add a different default method eg. unregisterMetrics() . Chances are that many of existing components already implement close() so the implementors will not be aware that they need to also unregister the component's metrics. I don't think we should expose GaugeWrapper outside of SolrMetricManager, this is really an internal detail. Adding the method above should be sufficient to explicitly unregister the metrics - and all SolrInfoBean -s already know what metric names (and what tag) they registered, because the tag is passed in initializeMetrics and the names are recorded in registerMetricName .
            noble.paul Noble Paul added a comment - - edited

            In SolrMetricProducer instead of using AutoCloseable.close() simply add a different default method eg. unregisterMetrics().

            I thought of doing it. But the problem is that there is a chance that unregisterMetrics() is not invoked at all, as it is not the responsibility of the metrics framework. AutoCloseable.close() is called anyway. All the cleanup should be performed there. Even if we have an unregisterMetrics() method, we should automatically call it from the AutoCloseable.close() method.

            I don't think we should expose GaugeWrapper outside of SolrMetricManager, this is really an internal detail.

            I have changed the branch to not expose the GaugeWrapper

            noble.paul Noble Paul added a comment - - edited In SolrMetricProducer instead of using AutoCloseable.close() simply add a different default method eg. unregisterMetrics() . I thought of doing it. But the problem is that there is a chance that unregisterMetrics() is not invoked at all, as it is not the responsibility of the metrics framework. AutoCloseable.close() is called anyway. All the cleanup should be performed there. Even if we have an unregisterMetrics() method, we should automatically call it from the AutoCloseable.close() method. I don't think we should expose GaugeWrapper outside of SolrMetricManager , this is really an internal detail. I have changed the branch to not expose the GaugeWrapper

            Commit f983ede5fc7d14cb004d1e8674b709ffefd8b06e in lucene-solr's branch refs/heads/jira/SOLR-13677 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f983ede ]

            SOLR-13677: Created an interface called GaugeRef instead of exposing the internal class GaugeWrapper

            jira-bot ASF subversion and git services added a comment - Commit f983ede5fc7d14cb004d1e8674b709ffefd8b06e in lucene-solr's branch refs/heads/jira/ SOLR-13677 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f983ede ] SOLR-13677 : Created an interface called GaugeRef instead of exposing the internal class GaugeWrapper
            - public void registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String... metricPath) {
            + public GaugeRef registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String... metricPath) {
            

            The above registerGauge method change to make it return a gauge reference encourages but does not ensure that the caller 'remembers' the reference and so that it is later then included in the unregister calls. There's also a small amount of repetition w.r.t. iterating over the myGauges collection and unregistering the elements.

            I wonder if some sort of container or wrapper class might be helpful i.e. SolrMetricManager.registerGauge would be sure to call 'remember' for the gauge and the close(?) method of the producer would call the 'forgetAll' method. What do you think?

            + class FooBar {
            +   private final List<GaugeRef> gaugeRefs = new ArrayList<>();
            +   void remember(GaugeRef gaugeRef) {
            +     gaugeRefs.add(gaugeRef);
            +   }
            +   void forgetAll() {
            +     for (GaugeRef gaugeRef : gaugeRefs) {
            +       gaugeRef.unregister();
            +     }
            +     gaugeRefs.clear();
            +   }
            + }
            + 
            + public void registerGauge(FooBar memory, SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String... metricPath) {
            +   memory.remember(registerGauge(info, registry, gauge, tag, force, metricName, metricPath));
            + }
            +
            + private GaugeRef registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String... metricPath) {
            - public void registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String... metricPath) {
                ...
            
            cpoerschke Christine Poerschke added a comment - - public void registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String ... metricPath) { + public GaugeRef registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String ... metricPath) { The above registerGauge method change to make it return a gauge reference encourages but does not ensure that the caller 'remembers' the reference and so that it is later then included in the unregister calls. There's also a small amount of repetition w.r.t. iterating over the myGauges collection and unregistering the elements. I wonder if some sort of container or wrapper class might be helpful i.e. SolrMetricManager.registerGauge would be sure to call 'remember' for the gauge and the close(?) method of the producer would call the 'forgetAll' method. What do you think? + class FooBar { + private final List<GaugeRef> gaugeRefs = new ArrayList<>(); + void remember(GaugeRef gaugeRef) { + gaugeRefs.add(gaugeRef); + } + void forgetAll() { + for (GaugeRef gaugeRef : gaugeRefs) { + gaugeRef.unregister(); + } + gaugeRefs.clear(); + } + } + + public void registerGauge(FooBar memory, SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String ... metricPath) { + memory.remember(registerGauge(info, registry, gauge, tag, force, metricName, metricPath)); + } + + private GaugeRef registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String ... metricPath) { - public void registerGauge(SolrInfoBean info, String registry, Gauge<?> gauge, String tag, boolean force, String metricName, String ... metricPath) { ...

            Thanks Christine, this looks somewhat cleaner.

            For historical reasons (a complicated refactoring of SolrInfoMBean, UI dependencies, JMX, etc) some of the methods one would expect from SolrMetricProducer ended up in SolrInfoBean instead, and this is also the interface that is passed to registerGauge. Unless we want to do a larger refactoring now, we could treat SolrInfoBean as the memory, and add the default methods for remembering and forgetting the gauges to this interface.

            ab Andrzej Bialecki added a comment - Thanks Christine, this looks somewhat cleaner. For historical reasons (a complicated refactoring of SolrInfoMBean, UI dependencies, JMX, etc) some of the methods one would expect from SolrMetricProducer ended up in SolrInfoBean instead, and this is also the interface that is passed to registerGauge . Unless we want to do a larger refactoring now, we could treat SolrInfoBean as the memory, and add the default methods for remembering and forgetting the gauges to this interface.

            Also, GaugeRef can be an interface that GaugeWrapper implements, then there's no need to create even more objects.

            ab Andrzej Bialecki added a comment - Also, GaugeRef can be an interface that GaugeWrapper implements, then there's no need to create even more objects.
            noble.paul Noble Paul added a comment -

            cpoerschke, the problem is not with adding those remember() forget() methods. There is no way to ensure that those methods are invoked . We may be able to check them today. But who will ensure that about the future components. The advantage of piggybacking on close() is that it is a well known and usually close is called always

            noble.paul Noble Paul added a comment - cpoerschke , the problem is not with adding those remember() forget() methods. There is no way to ensure that those methods are invoked . We may be able to check them today. But who will ensure that about the future components. The advantage of piggybacking on close() is that it is a well known and usually close is called always

            usually close is called always

            There's no hard guarantee in any of these scenarios, just a degree of likelihood.

            I think that the patch over-complicates things by keeping an explicit reference to the gauges. It should not be needed if we properly use the tag argument when registering gauges.

            The tag attribute was added to solve a problem of non-deterministic order of gauge registration during core reload - the new core would already register some of the new gauges but the old core would linger for a while and when it tried to unregister gauges with the same names it would unregister the new ones instead of the old ones.

            So the important thing about the tag argument in SolrMetricManager.registerGauge and SolrMetricManager.unregisterGauges is this: the tag represents an object or a group of objects with the same life-cycle. Until now the tag was generated by SolrCore and passed to all its components because they had the same lifecycle. Also, it was SolrCore (via SolrCoreMetricManager) that would call unregisterGauges on behalf of all its components.

            Now, if the life-cycle of components is different from that of SolrCore then we need to make sure of two things:

            • the tag properly represents the object or group of objects with the same life-cycle - so if eg. SolrCache-s can be re-loaded without reloading SolrCore then they should no longer use the same tag as their parent SolrCore. Instead they should generate their own tags.
            • each component must be now responsible for unregistering its own gauges, as identified by its own tag. We can strongly encourage implementors to do this in each component's AutoCloseable.close() but I don't see any easy way to actually enforce it.

            This approach doesn't require keeping actual references to gauges in each component.

            For convenience we could also extend the concept of tag so that it's multi-valued - eg. a cache would use its own tag and the parent SolrCore-s tag. This way both the cache and the SolrCore could each easily unregister gauges that they (or their parent) created.

            ab Andrzej Bialecki added a comment - usually close is called always There's no hard guarantee in any of these scenarios, just a degree of likelihood. I think that the patch over-complicates things by keeping an explicit reference to the gauges. It should not be needed if we properly use the tag argument when registering gauges. The tag attribute was added to solve a problem of non-deterministic order of gauge registration during core reload - the new core would already register some of the new gauges but the old core would linger for a while and when it tried to unregister gauges with the same names it would unregister the new ones instead of the old ones. So the important thing about the tag argument in SolrMetricManager.registerGauge and SolrMetricManager.unregisterGauges is this: the tag represents an object or a group of objects with the same life-cycle. Until now the tag was generated by SolrCore and passed to all its components because they had the same lifecycle. Also, it was SolrCore (via SolrCoreMetricManager ) that would call unregisterGauges on behalf of all its components. Now, if the life-cycle of components is different from that of SolrCore then we need to make sure of two things: the tag properly represents the object or group of objects with the same life-cycle - so if eg. SolrCache-s can be re-loaded without reloading SolrCore then they should no longer use the same tag as their parent SolrCore. Instead they should generate their own tags. each component must be now responsible for unregistering its own gauges, as identified by its own tag. We can strongly encourage implementors to do this in each component's AutoCloseable.close() but I don't see any easy way to actually enforce it. This approach doesn't require keeping actual references to gauges in each component. For convenience we could also extend the concept of tag so that it's multi-valued - eg. a cache would use its own tag and the parent SolrCore-s tag. This way both the cache and the SolrCore could each easily unregister gauges that they (or their parent) created.
            noble.paul Noble Paul added a comment -

            the tag properly represents the object or group of objects with the same life-cycle -...

            I think this is a better solution. I was not sure about the implications of such a change. I will try to implement it and you can review it

            noble.paul Noble Paul added a comment - the tag properly represents the object or group of objects with the same life-cycle -... I think this is a better solution. I was not sure about the implications of such a change. I will try to implement it and you can review it

            Commit 9b0003a7037206d937b9f4aa48e5dc4cf80fdd0f in lucene-solr's branch refs/heads/jira/SOLR-13677_1 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9b0003a ]

            SOLR-13677: Take 2

            jira-bot ASF subversion and git services added a comment - Commit 9b0003a7037206d937b9f4aa48e5dc4cf80fdd0f in lucene-solr's branch refs/heads/jira/ SOLR-13677 _1 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9b0003a ] SOLR-13677 : Take 2
            noble.paul Noble Paul added a comment -

            ab please take a look at the new PR

            noble.paul Noble Paul added a comment - ab please take a look at the new PR

            Commit 956f61dde9b1ce3fee6b609955d41d0b90c67285 in lucene-solr's branch refs/heads/jira/SOLR-13677_1 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=956f61d ]

            SOLR-13677: minimize changes

            jira-bot ASF subversion and git services added a comment - Commit 956f61dde9b1ce3fee6b609955d41d0b90c67285 in lucene-solr's branch refs/heads/jira/ SOLR-13677 _1 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=956f61d ] SOLR-13677 : minimize changes

            Commit 2a7d2df5ff52bb09daf3223c4144aaf6964acdfe in lucene-solr's branch refs/heads/jira/SOLR-13677_2 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2a7d2df ]

            SOLR-13677: new fresh branch

            jira-bot ASF subversion and git services added a comment - Commit 2a7d2df5ff52bb09daf3223c4144aaf6964acdfe in lucene-solr's branch refs/heads/jira/ SOLR-13677 _2 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2a7d2df ] SOLR-13677 : new fresh branch

            Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/master from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them (#836)

            • SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them
            jira-bot ASF subversion and git services added a comment - Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them (#836) SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/master from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them (#836)

            • SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them
            jira-bot ASF subversion and git services added a comment - Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them (#836) SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit 6c94f659d25128614f7241c0c0e4f310e6737e36 in lucene-solr's branch refs/heads/master from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6c94f65 ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them

            jira-bot ASF subversion and git services added a comment - Commit 6c94f659d25128614f7241c0c0e4f310e6737e36 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6c94f65 ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit 1c0490c9ab472327f1267ed46f5ba844de39ae77 in lucene-solr's branch refs/heads/branch_8x from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1c0490c ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them (#836)

            • SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them
            jira-bot ASF subversion and git services added a comment - Commit 1c0490c9ab472327f1267ed46f5ba844de39ae77 in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1c0490c ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them (#836) SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit 1c0490c9ab472327f1267ed46f5ba844de39ae77 in lucene-solr's branch refs/heads/branch_8x from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1c0490c ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them (#836)

            • SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them
            jira-bot ASF subversion and git services added a comment - Commit 1c0490c9ab472327f1267ed46f5ba844de39ae77 in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1c0490c ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them (#836) SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit bad55fd51a0b83d2ef632e72a2d0b5e0cb30f88e in lucene-solr's branch refs/heads/branch_8x from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bad55fd ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them

            jira-bot ASF subversion and git services added a comment - Commit bad55fd51a0b83d2ef632e72a2d0b5e0cb30f88e in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bad55fd ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/jira/SOLR-13650 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them (#836)

            • SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them
            jira-bot ASF subversion and git services added a comment - Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/jira/ SOLR-13650 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them (#836) SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/jira/SOLR-13650 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them (#836)

            • SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them
            jira-bot ASF subversion and git services added a comment - Commit 7415fe453972fb23debd4f570599b5eeb0376ecb in lucene-solr's branch refs/heads/jira/ SOLR-13650 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7415fe4 ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them (#836) SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            Commit 6c94f659d25128614f7241c0c0e4f310e6737e36 in lucene-solr's branch refs/heads/jira/SOLR-13650 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6c94f65 ]

            SOLR-13677: All Metrics Gauges should be unregistered by the objects that registered them

            jira-bot ASF subversion and git services added a comment - Commit 6c94f659d25128614f7241c0c0e4f310e6737e36 in lucene-solr's branch refs/heads/jira/ SOLR-13650 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6c94f65 ] SOLR-13677 : All Metrics Gauges should be unregistered by the objects that registered them

            There were several issues that we pointed out in your patch before the last refactoring - and yet after the last refactoring you didn't give anyone a chance to review your changes before committing.

            I object to this on principle - knowing that there are issues to be resolved you should've waited a reasonable time for review, and 5 hours is not reasonable, especially during summer holidays. On this grounds alone I'd like to request that you revert these commits.

            However, in addition to that there are still several issues remaining in the committed changes that need to be fixed - and it would be just easier and cleaner to do this by reverting, fixing and committing them again after review:

            • SolrMetrics is a misleading name - it implies more functionality than it really provides, which is just a dumb context to keep several references together. Previous patch used MetricsInfo which was not ideal but better that this. I propose SolrMetricsContext.
            • Also, the class and its methods have 0 javadocs - I hope that one day this will actually become a precommit error.
            • looking at usages of this class it seems to me that perhaps it should not include scope - take for example SolrCoreMetricManager.registerMetricProducer (which hasn't been converted yet to the new API!), or any other use of the old SolrMetricProducer.initializeMetrics: it would be so much easier to create just one SolrMetrics object and pass it to each producer, regardless of its scope.
            • I don't understand the need to keep a reference SolrMetrics.parent. This is used only in one place in MetricsHandlerTest.RefreshablePluginHolder.closeHandler(), which looks like a hack and has no way of working anyway (comparing hex hashCode to plain int hashCode).
            • GaugeWrapper.unregisterGauges: expression wrapper.getTag().contains(tagSegment) already includes tagSegment.equals(wrapper.getTag()).
            • I don't understand the conditional in SolrMetricProducer.close() - basically it means that you unregister on close only non-root components and all their children. Why exclude root components?
            • SolrMetrics.gauge really doesn't need to mess with explicitly creating a GaugeWrapper, and the new (undocumented) method in SolrMetricManager that you added is dangerous - because it exposes the ability to easily register unwrapped gauges. Instead SolrMetric.gauge should simply delegate to SolrMetricManager.registerGauge. Also, providing a null metricName is an error and should throw an exception - is there any situation where you would want that?
            • I don't see any need for the newly added methods in SolrMetricManager - existing methods were sufficient for implementing the functionality in SolrMetrics. Please remove them, they only increase the size of the API without adding any benefit.
            • SolrMetricManager.unregisterGauges(String registryName, String tagSegment) should have some javadoc now, because it's not obvious what is a "tagSegment".
            • SolrMetricProducer.getUniqueMetricTag has a misleading javadoc comment - it does not produce a metric name but a tag for tying together objects of the same life-cycle.
            • Additionally, if parentName.contains(name) should be an error - I don't see when it would be a valid occurrence?
            • SolrMetricProducer.initializeMetrics(SolrMetrics) has this peculiar message in the exception: ""This means , the class has not implemented both of these methods". We can do so much better here...
            • the separator for tag hierarchy is ":" even though the javadoc claims it's "A.B.C". FWIW I think ":" is a better choice, so just the javadoc needs to be fixed.
            • SolrMetricProducer.getMetrics() should not have a default implementation on master, only on branch_8x. Also, the name of the method should reflect the name of the object it returns, whatever we end up with.
            • similarly, SolrMetricProducer.initializeMetrics(SolrMetricManager ...) should be deprecated on branch_8x and removed from master.
            • PluginBag.put(String, T) should not take care of closing old instances of plugins - instead the corresponding PluginBag.put(String, PluginHolder<T>) (which is also called directly from SolrCore) should do this, otherwise in some cases the old plugin will be closed and in some others it won't. This method should also check for object identity, if the code re-registers the same instance of plugin holder.
            • RequestHandlerBase.getMetricRegistry will throw an NPE when called before initializeMetrics is called.
            • why does SuggestComponent still call metricsInfo.metricManager.registerGauge instead of metricsInfo.gauge ? BTW, the name of this field should be changed to reflect the name of the SolrMetrics class.
            • FastLRUCache.close() and the same method in other caches should simply call SolrMetricProducer.super.close() instead of replicating the logic from that method.
            • there are no unit tests for the new functionality of managing gauges with hierarchical metric tags.
            • the new field SolrMetrics metrics should be declared together with the declarations of other fields.
            • I see that in general many SolrMetricProducer implementations on master still use the old API - they should be converted to the new API if we're serious about using it and deprecating the old one. This can be done in a separate issue.
            ab Andrzej Bialecki added a comment - There were several issues that we pointed out in your patch before the last refactoring - and yet after the last refactoring you didn't give anyone a chance to review your changes before committing. I object to this on principle - knowing that there are issues to be resolved you should've waited a reasonable time for review, and 5 hours is not reasonable, especially during summer holidays. On this grounds alone I'd like to request that you revert these commits. However, in addition to that there are still several issues remaining in the committed changes that need to be fixed - and it would be just easier and cleaner to do this by reverting, fixing and committing them again after review: SolrMetrics is a misleading name - it implies more functionality than it really provides, which is just a dumb context to keep several references together. Previous patch used MetricsInfo which was not ideal but better that this. I propose SolrMetricsContext . Also, the class and its methods have 0 javadocs - I hope that one day this will actually become a precommit error. looking at usages of this class it seems to me that perhaps it should not include scope - take for example SolrCoreMetricManager.registerMetricProducer (which hasn't been converted yet to the new API!), or any other use of the old SolrMetricProducer.initializeMetrics : it would be so much easier to create just one SolrMetrics object and pass it to each producer, regardless of its scope. I don't understand the need to keep a reference SolrMetrics.parent . This is used only in one place in MetricsHandlerTest.RefreshablePluginHolder.closeHandler() , which looks like a hack and has no way of working anyway (comparing hex hashCode to plain int hashCode). GaugeWrapper.unregisterGauges : expression wrapper.getTag().contains(tagSegment) already includes tagSegment.equals(wrapper.getTag()) . I don't understand the conditional in SolrMetricProducer.close() - basically it means that you unregister on close only non-root components and all their children. Why exclude root components? SolrMetrics.gauge really doesn't need to mess with explicitly creating a GaugeWrapper , and the new (undocumented) method in SolrMetricManager that you added is dangerous - because it exposes the ability to easily register unwrapped gauges. Instead SolrMetric.gauge should simply delegate to SolrMetricManager.registerGauge . Also, providing a null metricName is an error and should throw an exception - is there any situation where you would want that? I don't see any need for the newly added methods in SolrMetricManager - existing methods were sufficient for implementing the functionality in SolrMetrics . Please remove them, they only increase the size of the API without adding any benefit. SolrMetricManager.unregisterGauges(String registryName, String tagSegment) should have some javadoc now, because it's not obvious what is a "tagSegment". SolrMetricProducer.getUniqueMetricTag has a misleading javadoc comment - it does not produce a metric name but a tag for tying together objects of the same life-cycle. Additionally, if parentName.contains(name) should be an error - I don't see when it would be a valid occurrence? SolrMetricProducer.initializeMetrics(SolrMetrics) has this peculiar message in the exception: ""This means , the class has not implemented both of these methods". We can do so much better here... the separator for tag hierarchy is ":" even though the javadoc claims it's "A.B.C". FWIW I think ":" is a better choice, so just the javadoc needs to be fixed. SolrMetricProducer.getMetrics() should not have a default implementation on master, only on branch_8x. Also, the name of the method should reflect the name of the object it returns, whatever we end up with. similarly, SolrMetricProducer.initializeMetrics(SolrMetricManager ...) should be deprecated on branch_8x and removed from master. PluginBag.put(String, T) should not take care of closing old instances of plugins - instead the corresponding PluginBag.put(String, PluginHolder<T>) (which is also called directly from SolrCore) should do this, otherwise in some cases the old plugin will be closed and in some others it won't. This method should also check for object identity, if the code re-registers the same instance of plugin holder. RequestHandlerBase.getMetricRegistry will throw an NPE when called before initializeMetrics is called. why does SuggestComponent still call metricsInfo.metricManager.registerGauge instead of metricsInfo.gauge ? BTW, the name of this field should be changed to reflect the name of the SolrMetrics class. FastLRUCache.close() and the same method in other caches should simply call SolrMetricProducer.super.close() instead of replicating the logic from that method. there are no unit tests for the new functionality of managing gauges with hierarchical metric tags. the new field SolrMetrics metrics should be declared together with the declarations of other fields. I see that in general many SolrMetricProducer implementations on master still use the old API - they should be converted to the new API if we're serious about using it and deprecating the old one. This can be done in a separate issue.

            noble.paul - please revert these changes until the above problems are addressed. I'll be happy to help you fix them - let's create a branch. 

            ab Andrzej Bialecki added a comment - noble.paul  - please revert these changes until the above problems are addressed. I'll be happy to help you fix them - let's create a branch. 
            noble.paul Noble Paul added a comment -
            noble.paul Noble Paul added a comment - let's make the changes/reverts in  https://github.com/apache/lucene-solr/tree/jira/SOLR-13677_3 . 

            I marked this as blocker for 8.3 to make sure the changes already committed are fixed before the release.

            ab Andrzej Bialecki added a comment - I marked this as blocker for 8.3 to make sure the changes already committed are fixed before the release.
            markrmiller@gmail.com Mark Miller added a comment -

            Reverts should take place right away, else we can easily end up in a bad situation. Development should be worked out on the branch, not in our release branches.

            Please let's do a simple and fast revert and then commit when we have consensus.

            markrmiller@gmail.com Mark Miller added a comment - Reverts should take place right away, else we can easily end up in a bad situation. Development should be worked out on the branch, not in our release branches. Please let's do a simple and fast revert and then commit when we have consensus.
            dsmiley David Smiley added a comment -

            Yes, lets revert this now then. I appreciate that ab is setting a good bar for quality software!

            dsmiley David Smiley added a comment - Yes, lets revert this now then . I appreciate that ab is setting a good bar for quality software!
            noble.paul Noble Paul added a comment - - edited

            ab 

            My plan of action

            • I'll revert all the changes I have made as a part of this issue in the branch SOLR-13677_3
            • This will also mean I'll "Ignore"  the tests which are failing due to this 
            • I'll leave it there for a couple of days for you to take any further action. I mean fix the original issue
            • After that I'll merge those changes back into master & branch_8x

             

             

            noble.paul Noble Paul added a comment - - edited ab   My plan of action I'll revert all the changes I have made as a part of this issue in the branch  SOLR-13677_3 .  This will also mean I'll "Ignore"  the tests which are failing due to this  I'll leave it there for a couple of days for you to take any further action. I mean fix the original issue After that I'll merge those changes back into master & branch_8x    

            noble.paul, ab Instead of this multi-day revert plan (which meet seem confusing to understand), I recommend an immediate revert in master and branch_8x.

            ichattopadhyaya Ishan Chattopadhyaya added a comment - noble.paul , ab Instead of this multi-day revert plan (which meet seem confusing to understand), I recommend an immediate revert in master and branch_8x.
            noble.paul Noble Paul added a comment -

            Yes, I'm planning to do this ASAP.
            Why should I be the only one bothered about a memory leak

            noble.paul Noble Paul added a comment - Yes, I'm planning to do this ASAP. Why should I be the only one bothered about a memory leak

            Commit 042478cfa795dd537dcd4863a0524a73bad9a740 in lucene-solr's branch refs/heads/jira/SOLR-13677_3 from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=042478c ]

            SOLR-13677: reverting the last commit

            jira-bot ASF subversion and git services added a comment - Commit 042478cfa795dd537dcd4863a0524a73bad9a740 in lucene-solr's branch refs/heads/jira/ SOLR-13677 _3 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=042478c ] SOLR-13677 : reverting the last commit
            noble.paul Noble Paul added a comment -

            As planned, I have committed the revert to SOLR-13677_3.

             

            I'll give it a few hours. run some testing and merge this to master / branch_8x

            noble.paul Noble Paul added a comment - As planned, I have committed the revert to  SOLR-13677_3 .   I'll give it a few hours. run some testing and merge this to master / branch_8x

            Commit 042478cfa795dd537dcd4863a0524a73bad9a740 in lucene-solr's branch refs/heads/master from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=042478c ]

            SOLR-13677: reverting the last commit

            jira-bot ASF subversion and git services added a comment - Commit 042478cfa795dd537dcd4863a0524a73bad9a740 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=042478c ] SOLR-13677 : reverting the last commit

            Commit a288710a64acdde6abc8ce96a0d3b3e18739ac32 in lucene-solr's branch refs/heads/master from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a288710 ]

            SOLR-13677: reverting the last commit (#863)

            jira-bot ASF subversion and git services added a comment - Commit a288710a64acdde6abc8ce96a0d3b3e18739ac32 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a288710 ] SOLR-13677 : reverting the last commit (#863)

            Commit b1bccf7cace424cb895ca6d05b30926697bfe86b in lucene-solr's branch refs/heads/branch_8x from Noble Paul
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b1bccf7 ]

            SOLR-13677: reverting the last commit

            jira-bot ASF subversion and git services added a comment - Commit b1bccf7cace424cb895ca6d05b30926697bfe86b in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b1bccf7 ] SOLR-13677 : reverting the last commit

            This is a blocker for 8.3, but is still unassigned. ab, based on the private conversations, should I assign it to you?

            (Or, noble.paul, should I assign it back to you, since you were attempting the fix?)

            ishan Ishan Chattopadhyaya added a comment - This is a blocker for 8.3, but is still unassigned. ab , based on the private conversations, should I assign it to you? (Or, noble.paul , should I assign it back to you, since you were attempting the fix?)

            How much does it leak in 8.2? Does leak happens under certain condition or action? Is 8.2 prod ready?

            mkhl Mikhail Khludnev added a comment - How much does it leak in 8.2? Does leak happens under certain condition or action? Is 8.2 prod ready?

            ab, assigned to you for further action/guidance. If you think you'll not be able to get to this by 8.3, can I commit Noble's fix?

            ichattopadhyaya Ishan Chattopadhyaya added a comment - ab , assigned to you for further action/guidance. If you think you'll not be able to get to this by 8.3, can I commit Noble's fix?
            noble.paul Noble Paul added a comment -

            mkhl no. the leak happens only if you unload a plugin w/o unloading the core. With hot reloading of plugins, it'll be a very common usecase

             

             

            noble.paul Noble Paul added a comment - mkhl  no. the leak happens only if you unload a plugin w/o unloading the core. With hot reloading of plugins, it'll be a very common usecase    

            ichattopadhyaya my objections to the last patch from Noble are still valid, so I would object to committing the same unchanged patch again. And I don't think we should start from scratch - we should just fix the problems in Noble's patch that I listed in my review.

            I hope to be able to look at this tomorrow. If other stuff intervenes then I won't be able to work on this until Monday next week.

            ab Andrzej Bialecki added a comment - ichattopadhyaya  my objections to the last patch from Noble are still valid, so I would object to committing the same unchanged patch again. And I don't think we should start from scratch - we should just fix the problems in Noble's patch that I listed in my review. I hope to be able to look at this tomorrow. If other stuff intervenes then I won't be able to work on this until Monday next week.

            noble.paul, thanks for the clarification. 

            mkhl Mikhail Khludnev added a comment - noble.paul , thanks for the clarification. 

            I hope to be able to look at this tomorrow. If other stuff intervenes then I won't be able to work on this until Monday next week.

            Hi ab, could you please get a chance to look at this?

            ichattopadhyaya Ishan Chattopadhyaya added a comment - I hope to be able to look at this tomorrow. If other stuff intervenes then I won't be able to work on this until Monday next week. Hi ab , could you please get a chance to look at this?

            I wasn't sure what was the state of various branches so I created a new one, with the latest patch from Noble: jira/solr-13677-final and I started fixing the issues from review.

            ab Andrzej Bialecki added a comment - I wasn't sure what was the state of various branches so I created a new one, with the latest patch from Noble: jira/solr-13677-final and I started fixing the issues from review.

            This is an updated patch, after cleanup of the issues listed in the review, and with some additional changes:

            • I excluded the scope from the context, because in most cases we can reuse the parent context for components with different scopes.
            • I converted some internal, non-pluggable components to use the new API.

            This still is a large change and needs more testing - I'll need another day to be reasonably sure that it doesn't break things.

            ab Andrzej Bialecki added a comment - This is an updated patch, after cleanup of the issues listed in the review, and with some additional changes: I excluded the scope from the context, because in most cases we can reuse the parent context for components with different scopes. I converted some internal, non-pluggable components to use the new API. This still is a large change and needs more testing - I'll need another day to be reasonably sure that it doesn't break things.

            noble.paul I would appreciate your review.

            ab Andrzej Bialecki added a comment - noble.paul I would appreciate your review.
            noble.paul Noble Paul added a comment -

            ab can you raise a PR so that we can review easily

            noble.paul Noble Paul added a comment - ab can you raise a PR so that we can review easily

            Commit f07998fc234c81ff956a84ee508b85f8d573ef38 in lucene-solr's branch refs/heads/master from Andrzej Bialecki
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f07998f ]

            SOLR-13677: All Metrics Gauges should be unregistered by components that registered them.

            jira-bot ASF subversion and git services added a comment - Commit f07998fc234c81ff956a84ee508b85f8d573ef38 in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f07998f ] SOLR-13677 : All Metrics Gauges should be unregistered by components that registered them.

            Commit 441af3e7aa5266e7929c93cb5f3abe0746e5c054 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=441af3e ]

            SOLR-13677: All Metrics Gauges should be unregistered by components that registered them.

            jira-bot ASF subversion and git services added a comment - Commit 441af3e7aa5266e7929c93cb5f3abe0746e5c054 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=441af3e ] SOLR-13677 : All Metrics Gauges should be unregistered by components that registered them.

            Commit d796eca84dbabe3ae9b3c27afc01ef3bee35acb1 in lucene-solr's branch refs/heads/branch_8_3 from Andrzej Bialecki
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d796eca ]

            SOLR-13677: All Metrics Gauges should be unregistered by components that registered them.

            jira-bot ASF subversion and git services added a comment - Commit d796eca84dbabe3ae9b3c27afc01ef3bee35acb1 in lucene-solr's branch refs/heads/branch_8_3 from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d796eca ] SOLR-13677 : All Metrics Gauges should be unregistered by components that registered them.

            Commit b029de191ebdfd50b81f29e3334ccaac46bd70da in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b029de1 ]

            SOLR-13677: Add a missing override, which resulted in missing metrics (reported by tflobbe).

            jira-bot ASF subversion and git services added a comment - Commit b029de191ebdfd50b81f29e3334ccaac46bd70da in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b029de1 ] SOLR-13677 : Add a missing override, which resulted in missing metrics (reported by tflobbe).

            Commit 2aa586909b911e66e1d8863aa89f173d69f86cd2 in lucene-solr's branch refs/heads/branch_8_3 from Andrzej Bialecki
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2aa5869 ]

            SOLR-13677: Add a missing override, which resulted in missing metrics (reported by tflobbe).

            jira-bot ASF subversion and git services added a comment - Commit 2aa586909b911e66e1d8863aa89f173d69f86cd2 in lucene-solr's branch refs/heads/branch_8_3 from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2aa5869 ] SOLR-13677 : Add a missing override, which resulted in missing metrics (reported by tflobbe).

            Commit 2aa586909b911e66e1d8863aa89f173d69f86cd2 in lucene-solr's branch refs/heads/jira/SOLR-13101 from Andrzej Bialecki
            [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2aa5869 ]

            SOLR-13677: Add a missing override, which resulted in missing metrics (reported by tflobbe).

            jira-bot ASF subversion and git services added a comment - Commit 2aa586909b911e66e1d8863aa89f173d69f86cd2 in lucene-solr's branch refs/heads/jira/ SOLR-13101 from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2aa5869 ] SOLR-13677 : Add a missing override, which resulted in missing metrics (reported by tflobbe).

            People

              ab Andrzej Bialecki
              noble.paul Noble Paul
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h 50m
                  6h 50m