Lucene - Core
  1. Lucene - Core
  2. LUCENE-5197

Add a method to SegmentReader to get the current index heap memory size

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: core/codecs, core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It would be useful to at least estimate the index heap size being used by Lucene. Ideally a method exposing this information at the SegmentReader level.

      1. LUCENE-5197.patch
        41 kB
        Robert Muir
      2. LUCENE-5197.patch
        55 kB
        Areek Zillur
      3. LUCENE-5197.patch
        53 kB
        Areek Zillur
      4. LUCENE-5197.patch
        52 kB
        Areek Zillur
      5. LUCENE-5197.patch
        48 kB
        Areek Zillur

        Activity

        Hide
        Areek Zillur added a comment -

        Initial patch file that only takes into account classes in the codecs package to estimate the index heap memory usage for a SegmentReader

        Show
        Areek Zillur added a comment - Initial patch file that only takes into account classes in the codecs package to estimate the index heap memory usage for a SegmentReader
        Hide
        Adrien Grand added a comment -

        Very nice work, I agree it would be nice to be able to have a better overview of Lucene's memory usage! Regarding the patch, it looks to me that some null checks are unnecessary, eg. AssertingFieldsConsumer.in.

        Show
        Adrien Grand added a comment - Very nice work, I agree it would be nice to be able to have a better overview of Lucene's memory usage! Regarding the patch, it looks to me that some null checks are unnecessary, eg. AssertingFieldsConsumer.in .
        Hide
        Dawid Weiss added a comment -

        I can foresee how all this calculations go out of sync with changing implementations Wouldn't RamUsageEstimator (or its modification) be a better choice to do it dynamically somehow?

        I realize there is an overhead involved, but it should be more portable/ accurate/ easier to maintain if it's not implemented separately in all of these classes.

        Just a thought.

        Show
        Dawid Weiss added a comment - I can foresee how all this calculations go out of sync with changing implementations Wouldn't RamUsageEstimator (or its modification) be a better choice to do it dynamically somehow? I realize there is an overhead involved, but it should be more portable/ accurate/ easier to maintain if it's not implemented separately in all of these classes. Just a thought.
        Hide
        Adrien Grand added a comment -

        I was thinking about RamUsageEstimator but it has some pitfalls too I think. For example, RAMInputStream has a link to RAMFile which in turns has a link to the RAMDirectory. So every time you would run RamUsageEstimator on a format producer, you would take into account the whole directory size?

        Show
        Adrien Grand added a comment - I was thinking about RamUsageEstimator but it has some pitfalls too I think. For example, RAMInputStream has a link to RAMFile which in turns has a link to the RAMDirectory. So every time you would run RamUsageEstimator on a format producer, you would take into account the whole directory size?
        Hide
        Shai Erera added a comment -

        Wouldn't RamUsageEstimator (or its modification) be a better choice to do it dynamically somehow?

        I agree and had exactly that thought in mind. This is not the type of API that needs to be blazing fast and the overhead in RamUsageEstimator is totally accepted IMO. I can see how this gets incorrect very quickly...

        Show
        Shai Erera added a comment - Wouldn't RamUsageEstimator (or its modification) be a better choice to do it dynamically somehow? I agree and had exactly that thought in mind. This is not the type of API that needs to be blazing fast and the overhead in RamUsageEstimator is totally accepted IMO. I can see how this gets incorrect very quickly...
        Hide
        Dawid Weiss added a comment -

        I guess you'd have to modify RamUsageEstimator to allow a tree-visitor pattern, then you'd be able to only account for the nodes that matter (or ignore those you know shouldn't be counted).

        This also has some caveats – like references to things that shouldn't be counted (thread locals reaching out to thread instances, loggers reaching out to pretty much everything...). Having said that, I can see how such an implementation can be tested to detect things like this early, which with the manual counting may be a problem.

        Again – I'm not saying "no", I'm just expressing concern about the size and complexity of the patch. If there's a simpler way to do it that is performance-acceptable we should probably follow Occam's razor...

        Show
        Dawid Weiss added a comment - I guess you'd have to modify RamUsageEstimator to allow a tree-visitor pattern, then you'd be able to only account for the nodes that matter (or ignore those you know shouldn't be counted). This also has some caveats – like references to things that shouldn't be counted (thread locals reaching out to thread instances, loggers reaching out to pretty much everything...). Having said that, I can see how such an implementation can be tested to detect things like this early, which with the manual counting may be a problem. Again – I'm not saying "no", I'm just expressing concern about the size and complexity of the patch. If there's a simpler way to do it that is performance-acceptable we should probably follow Occam's razor...
        Hide
        Adrien Grand added a comment -

        Again – I'm not saying "no", I'm just expressing concern about the size and complexity of the patch.

        Sure, I'm concerned about this too and I think the tree-visitor pattern for RamUsageEstimator is a nice idea!

        Show
        Adrien Grand added a comment - Again – I'm not saying "no", I'm just expressing concern about the size and complexity of the patch. Sure, I'm concerned about this too and I think the tree-visitor pattern for RamUsageEstimator is a nice idea!
        Hide
        Dawid Weiss added a comment -

        It'd be cool to have it for other reasons as well (excluding certain fields from being counted, etc.). The minor glitch is that RamUsageEstimator's implementation is stack-based, not recursive (to prevents stack overflows) but it shouldn't be a biggie.

        Show
        Dawid Weiss added a comment - It'd be cool to have it for other reasons as well (excluding certain fields from being counted, etc.). The minor glitch is that RamUsageEstimator's implementation is stack-based, not recursive (to prevents stack overflows) but it shouldn't be a biggie.
        Hide
        Michael McCandless added a comment -

        This would be a great addition! And I agree if we can somehow do this with RUE, being able to restrict where it's allowed to "crawl", that would be nice.

        Separately, I wonder if we could get a breakdown of the RAM usage ... sort of like Explanation, which returns both the value and a String (English) description of how the value was computed. But this can come later ... just a ramBytesUsed returning long is a great start.

        Show
        Michael McCandless added a comment - This would be a great addition! And I agree if we can somehow do this with RUE, being able to restrict where it's allowed to "crawl", that would be nice. Separately, I wonder if we could get a breakdown of the RAM usage ... sort of like Explanation, which returns both the value and a String (English) description of how the value was computed. But this can come later ... just a ramBytesUsed returning long is a great start.
        Hide
        Dawid Weiss added a comment -

        This can be done with the same implementation – a tree visitor could collect partial sums and aggregate them for the object hierarchy in the form of a tree rather than sum it all up.

        This is sort-of implemented here (although I kept the same implementation of RamUsageEstimator; a visitor pattern would be more elegant I think).
        https://github.com/dweiss/java-sizeof/blob/master/src/main/java/com/carrotsearch/sizeof/ObjectTree.java

        Show
        Dawid Weiss added a comment - This can be done with the same implementation – a tree visitor could collect partial sums and aggregate them for the object hierarchy in the form of a tree rather than sum it all up. This is sort-of implemented here (although I kept the same implementation of RamUsageEstimator; a visitor pattern would be more elegant I think). https://github.com/dweiss/java-sizeof/blob/master/src/main/java/com/carrotsearch/sizeof/ObjectTree.java
        Hide
        Uwe Schindler added a comment -

        From my perspective: Usage of RAMUsageEstimator is to be preferred, everything else gets outdated very fast (especially we have no idea about caches like FieldCache used). In production RAMDirectory is not used, and things like MMap or NIOFSDir have no heap usage and default codec is also not much, so I see no reason to don't trust offical RAM usage as reported by RAMUsageEstimator.
        The memory usage counting of IndexWriter is way different to what we have on the IndexReader side. The accounting done on IndexWriter side are much more under control of the Lucene code and are very fine granular, but stuff like proposed changes in FixedBitSet are just nonsense to me. RAMUsageEstimator can estimate FixedBitSet very correct (that's just easy and in my opinion 100% correct).

        Show
        Uwe Schindler added a comment - From my perspective: Usage of RAMUsageEstimator is to be preferred, everything else gets outdated very fast (especially we have no idea about caches like FieldCache used). In production RAMDirectory is not used, and things like MMap or NIOFSDir have no heap usage and default codec is also not much, so I see no reason to don't trust offical RAM usage as reported by RAMUsageEstimator. The memory usage counting of IndexWriter is way different to what we have on the IndexReader side. The accounting done on IndexWriter side are much more under control of the Lucene code and are very fine granular, but stuff like proposed changes in FixedBitSet are just nonsense to me. RAMUsageEstimator can estimate FixedBitSet very correct (that's just easy and in my opinion 100% correct).
        Hide
        Robert Muir added a comment -

        RAMUsageEstimator is quite slow when you need to run it on lots of objects (e.g. the codec tree here).

        Also it totally breaks down if it hits certain objects like a ThreadLocal.

        Show
        Robert Muir added a comment - RAMUsageEstimator is quite slow when you need to run it on lots of objects (e.g. the codec tree here). Also it totally breaks down if it hits certain objects like a ThreadLocal.
        Hide
        Dawid Weiss added a comment -

        > Also it totally breaks down if it hits certain objects like a ThreadLocal.

        That's why I suggested a visitor pattern, you could tune it not to enter such variables. Also note that if there are lots of objects then the object representation overhead itself will be significant and will vary depending on each VM, its settings, etc; a specific snippet of code to estimate each object's memory use may be faster but it'll be either a nightmare to maintain or it'll be a very rough approximate.

        I think it'd be better to try to make RUE faster/ more flexible. Like Shai mentioned – if it's not a performance-critical API then the difference will not be at all significant.

        Show
        Dawid Weiss added a comment - > Also it totally breaks down if it hits certain objects like a ThreadLocal. That's why I suggested a visitor pattern, you could tune it not to enter such variables. Also note that if there are lots of objects then the object representation overhead itself will be significant and will vary depending on each VM, its settings, etc; a specific snippet of code to estimate each object's memory use may be faster but it'll be either a nightmare to maintain or it'll be a very rough approximate. I think it'd be better to try to make RUE faster/ more flexible. Like Shai mentioned – if it's not a performance-critical API then the difference will not be at all significant.
        Hide
        Areek Zillur added a comment - - edited

        Attached a patch removing redundant null checks as Adrien suggested.

        First of all wanted to thank everybody for their valuable inputs.

        The reasons why I choose to have an explicit method to calculate the heap size rather than using the RAMUsageEstimator already has surfaced in the discussion above (slow for many objects, incorrect for some type of objects).
        It would be nice to have an API to call from (solr Admin for example) to estimate the current index heap size.

        I do understand the concern regarding the "out of sync with implementation changes" concern, I mainly took into account the codecs for the size estimation only such that higher level APIs need not to implement the method.

        The suggested modified RAMUsageEstimator sounds nice, but as far as I understand would not the logic in implementing the "excluded objects" change just as much? while being more implicit than the proposed solution above?

        Show
        Areek Zillur added a comment - - edited Attached a patch removing redundant null checks as Adrien suggested. First of all wanted to thank everybody for their valuable inputs. The reasons why I choose to have an explicit method to calculate the heap size rather than using the RAMUsageEstimator already has surfaced in the discussion above (slow for many objects, incorrect for some type of objects). It would be nice to have an API to call from (solr Admin for example) to estimate the current index heap size. I do understand the concern regarding the "out of sync with implementation changes" concern, I mainly took into account the codecs for the size estimation only such that higher level APIs need not to implement the method. The suggested modified RAMUsageEstimator sounds nice, but as far as I understand would not the logic in implementing the "excluded objects" change just as much? while being more implicit than the proposed solution above?
        Hide
        Dawid Weiss added a comment -

        > incorrect for some type of objects

        Can you elaborate on this? Where was it incorrect?

        > would not the logic in implementing the "excluded objects" change just as much?

        I think it'd be much simpler to exclude the type of objects we know spin out of control - loggers, thread locals, thread references and leave the remaining stuff accounted. After all if it's referenced it does take space on the heap so the figure is correct. What you're doing is actually a skewed view – it measures certain fields selectively.

        I was also thinking in terms of tests – one can create a sanity test which will create a small index, measure its RAM usage and then fail if it seems "too large" (because a thread local or some other field was accounted for). I don't see a way to do such a sanity check for per-class handcrafted code (unless you want to test against RamUsageEstimator, which would duplicate the effort anyway).

        Let me stress again that I'm not against your patch, I just have a gut feeling it'll be a recurring theme of new issues in Jira.

        Yet another idea sprang to my mind – perhaps if speed is an issue and certain types of objects can efficiently calculate their RAM usage (FSTs), we could make RamUsageEstimator "aware" of such objects by introducing an interface like:

        interface IKnowMySize {
          public long /* super. :) */ sizeMe();
        }
        

        Jokes aside, this could be implemented for classes which indeed have a complex structure and the rest (arrays, etc.) could be counted efficiently by walking the reference graph. Just a thought.

        Show
        Dawid Weiss added a comment - > incorrect for some type of objects Can you elaborate on this? Where was it incorrect? > would not the logic in implementing the "excluded objects" change just as much? I think it'd be much simpler to exclude the type of objects we know spin out of control - loggers, thread locals, thread references and leave the remaining stuff accounted. After all if it's referenced it does take space on the heap so the figure is correct. What you're doing is actually a skewed view – it measures certain fields selectively. I was also thinking in terms of tests – one can create a sanity test which will create a small index, measure its RAM usage and then fail if it seems "too large" (because a thread local or some other field was accounted for). I don't see a way to do such a sanity check for per-class handcrafted code (unless you want to test against RamUsageEstimator, which would duplicate the effort anyway). Let me stress again that I'm not against your patch, I just have a gut feeling it'll be a recurring theme of new issues in Jira. Yet another idea sprang to my mind – perhaps if speed is an issue and certain types of objects can efficiently calculate their RAM usage (FSTs), we could make RamUsageEstimator "aware" of such objects by introducing an interface like: interface IKnowMySize { public long /* super . :) */ sizeMe(); } Jokes aside, this could be implemented for classes which indeed have a complex structure and the rest (arrays, etc.) could be counted efficiently by walking the reference graph. Just a thought.
        Hide
        Robert Muir added a comment -

        What you're doing is actually a skewed view – it measures certain fields selectively.

        And from a big O perspective, this might be just fine.

        The way i see it, this would be a way to see how much RAM the lucene segment needs for someone's content.
        Things like the terms index and docvalues fields grow according to the content in different ways: e.g. how large/how many terms you have, how they share prefixes, how many documents you have, and so on.

        The "skew" is just boring constants pulled out of the equation, even if its 2KB or so, its not interesting at all since its just a constant cost independent of the content.

        Show
        Robert Muir added a comment - What you're doing is actually a skewed view – it measures certain fields selectively. And from a big O perspective, this might be just fine. The way i see it, this would be a way to see how much RAM the lucene segment needs for someone's content. Things like the terms index and docvalues fields grow according to the content in different ways: e.g. how large/how many terms you have, how they share prefixes, how many documents you have, and so on. The "skew" is just boring constants pulled out of the equation, even if its 2KB or so, its not interesting at all since its just a constant cost independent of the content.
        Hide
        Areek Zillur added a comment -

        >Can you elaborate on this? Where was it incorrect?

        • by incorrect I meant walking up undesirable member variables in a an object. In hindsight, I would say that was a bad choice of wording. I think the correct word would be inflexible.

        I do like the RamUsageEstimator "aware"-ness idea. I think that along with some kind of filtering mechanism in RamUsageEstimator would be perfect.

        Show
        Areek Zillur added a comment - >Can you elaborate on this? Where was it incorrect? by incorrect I meant walking up undesirable member variables in a an object. In hindsight, I would say that was a bad choice of wording. I think the correct word would be inflexible. I do like the RamUsageEstimator "aware"-ness idea. I think that along with some kind of filtering mechanism in RamUsageEstimator would be perfect.
        Hide
        Dawid Weiss added a comment -

        > And from a big O perspective, this might be just fine.

        He, he. From a big-O perspective the cost of RUE vs. custom code is negligible

        Anyway, fine – I still think it'd be a nice addition to RUE to allow selective counting (to exclude certain fields from the equation) and it'd be a perfect use case to apply here. But it can be used in other places (like in tests to count static memory usage held by a class, etc.).

        Show
        Dawid Weiss added a comment - > And from a big O perspective, this might be just fine. He, he. From a big-O perspective the cost of RUE vs. custom code is negligible Anyway, fine – I still think it'd be a nice addition to RUE to allow selective counting (to exclude certain fields from the equation) and it'd be a perfect use case to apply here. But it can be used in other places (like in tests to count static memory usage held by a class, etc.).
        Hide
        Michael McCandless added a comment -

        +1 to current patch, except SimpleTextFieldsReader does in fact use RAM (it has a termsCache, and it sneakily pre-loads all terms for each field into an FST!).

        I think we should go with this current approach, and then later, if/when we improve RUE to easily restrict where it crawls / speed it up / etc., then we can cutover.

        Show
        Michael McCandless added a comment - +1 to current patch, except SimpleTextFieldsReader does in fact use RAM (it has a termsCache, and it sneakily pre-loads all terms for each field into an FST!). I think we should go with this current approach, and then later, if/when we improve RUE to easily restrict where it crawls / speed it up / etc., then we can cutover.
        Hide
        Robert Muir added a comment -

        Can we rename FixedGapTermsIndexReader.getSizeInBytes to FixedGapTermsIndexReader.ramBytesUsed?

        Otherwise, the patch consistently uses the same name (ramBytesUsed) throughout, its just this one that is inconsistent.

        Show
        Robert Muir added a comment - Can we rename FixedGapTermsIndexReader.getSizeInBytes to FixedGapTermsIndexReader.ramBytesUsed? Otherwise, the patch consistently uses the same name (ramBytesUsed) throughout, its just this one that is inconsistent.
        Hide
        Areek Zillur added a comment -

        Changed getSizeInbytes to ramBytesUsed as Robert suggested

        Show
        Areek Zillur added a comment - Changed getSizeInbytes to ramBytesUsed as Robert suggested
        Hide
        Areek Zillur added a comment -

        Michael McCandless
        I can add a ramBytesUsed method to the SimpleTextTerms class to account for it. But only under the assumption that SimpleTextTerms implementation will be used for the SimpleTextFieldsReader (it uses the abstract Terms class in the termsCache). comments?

        Show
        Areek Zillur added a comment - Michael McCandless I can add a ramBytesUsed method to the SimpleTextTerms class to account for it. But only under the assumption that SimpleTextTerms implementation will be used for the SimpleTextFieldsReader (it uses the abstract Terms class in the termsCache). comments?
        Hide
        Michael McCandless added a comment -

        But only under the assumption that SimpleTextTerms implementation will be used for the SimpleTextFieldsReader (it uses the abstract Terms class in the termsCache). comments?

        I think it's fine to change its termsCache to be SimpleTextTerms. Thanks!

        Show
        Michael McCandless added a comment - But only under the assumption that SimpleTextTerms implementation will be used for the SimpleTextFieldsReader (it uses the abstract Terms class in the termsCache). comments? I think it's fine to change its termsCache to be SimpleTextTerms. Thanks!
        Hide
        Areek Zillur added a comment -

        Took into account termsCache in SimpleTextFieldReader as discussed with Michael.

        Show
        Areek Zillur added a comment - Took into account termsCache in SimpleTextFieldReader as discussed with Michael.
        Hide
        Robert Muir added a comment -

        Some minor cleanups / improvements:

        Fixed calculations for all-in-ram DV impls: for the esoteric/deprecated ones, it just uses RUE rather than making the code complicated. Facet42 is easy though and accounts correctly now.

        Added missing null check for VariableGapReader's FST (it can happen when there are no terms).

        Show
        Robert Muir added a comment - Some minor cleanups / improvements: Fixed calculations for all-in-ram DV impls: for the esoteric/deprecated ones, it just uses RUE rather than making the code complicated. Facet42 is easy though and accounts correctly now. Added missing null check for VariableGapReader's FST (it can happen when there are no terms).
        Hide
        ASF subversion and git services added a comment -

        Commit 1521267 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1521267 ]

        LUCENE-5197: Added SegmentReader.ramBytesUsed

        Show
        ASF subversion and git services added a comment - Commit 1521267 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1521267 ] LUCENE-5197 : Added SegmentReader.ramBytesUsed
        Hide
        ASF subversion and git services added a comment -

        Commit 1521284 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1521284 ]

        LUCENE-5197: Added SegmentReader.ramBytesUsed

        Show
        ASF subversion and git services added a comment - Commit 1521284 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1521284 ] LUCENE-5197 : Added SegmentReader.ramBytesUsed
        Hide
        Robert Muir added a comment -

        Thanks Areek!

        Show
        Robert Muir added a comment - Thanks Areek!
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Unassigned
            Reporter:
            Areek Zillur
          • Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development