Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Since Lucene 4.5, you can see how much memory lucene is using at a basic level by looking at SegmentReader.ramBytesUsed()

      In 4.11 its already improved, you can pull the codec producers and get ram usage split out by postings, norms, docvalues, stored fields, term vectors, etc.

      Unfortunately most toString's are fairly useless, so you don't have any insight further than that, even though behind the scenes its mostly just adding up other Accountables.

      So instead if we can improve the toString's, and if an Accountable can return its children, we can connect all the dots and you can easily diagnose/debug issues and see what is going on. I know i've been frustrated with having to hack up tons of System.out.printlns during development to see this stuff.

      So I think we should add this method to Accountable:

        /**
         * Returns nested resources of this class. 
         * The result should be a point-in-time snapshot (to avoid race conditions).
         * @see Accountables
         */
        // TODO: on java8 make this a default method returning emptyList
        Iterable<? extends Accountable> getChildResources();
      

      We can also add a simple helper method for quick debugging Accountables.toString(Accountable) to print the "tree", example output for a lucene segment:

      _5f(5.0.0):C8330469: 36.4 MB
      |-- postings [PerFieldPostings(formats=1)]: 8 MB
          |-- format 'Lucene41_0' [BlockTreeTermsReader(fields=6,delegate=Lucene41PostingsReader(positions=true,payloads=false))]: 8 MB
              |-- field 'alternatenames' [BlockTreeTerms(terms=3360242,postings=13779349,positions=17102250,docs=2876726)]: 945.2 KB
                  |-- term index [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=23318,arcs=66497)]: 945.1 KB
              |-- field 'asciiname' [BlockTreeTerms(terms=2451266,postings=16849659,positions=16891234,docs=8329981)]: 686.1 KB
                  |-- term index [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=12976,arcs=44103)]: 686 KB
              |-- field 'geonameid' [BlockTreeTerms(terms=8363399,postings=33321876,positions=-1,docs=8330469)]: 1.3 MB
                  |-- term index [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=528,arcs=66225)]: 1.3 MB
              |-- field 'latitude' [BlockTreeTerms(terms=8714542,postings=33321876,positions=-1,docs=8330469)]: 1.7 MB
                  |-- term index [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=854,arcs=77300)]: 1.7 MB
              |-- field 'longitude' [BlockTreeTerms(terms=11557222,postings=33321876,positions=-1,docs=8330469)]: 2.6 MB
                  |-- term index [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=1577,arcs=114570)]: 2.6 MB
              |-- field 'name' [BlockTreeTerms(terms=2598879,postings=16833071,positions=16874267,docs=8330325)]: 771.5 KB
                  |-- term index [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=13790,arcs=46514)]: 771.3 KB
              |-- delegate [Lucene41PostingsReader(positions=true,payloads=false)]: 32 bytes
      |-- norms [Lucene49NormsProducer(fields=3,active=3)]: 15.9 MB
          |-- field 'alternatenames' [byte array]: 7.9 MB
          |-- field 'asciiname' [table compressed [Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
          |-- field 'name' [table compressed [Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
      |-- docvalues [PerFieldDocValues(formats=1)]: 12.1 MB
          |-- format 'Lucene410_0' [Lucene410DocValuesProducer(fields=5)]: 12.1 MB
              |-- addresses field 'alternatenames' [MonotonicBlockPackedReader(blocksize=16384,size=407026,avgBPV=16)]: 808.5 KB
              |-- addresses field 'asciiname' [MonotonicBlockPackedReader(blocksize=16384,size=330528,avgBPV=17)]: 698.6 KB
              |-- addresses field 'name' [MonotonicBlockPackedReader(blocksize=16384,size=335020,avgBPV=17)]: 703.7 KB
              |-- ord index field 'alternatenames' [MonotonicBlockPackedReader(blocksize=16384,size=8330470,avgBPV=9)]: 9.8 MB
              |-- reverse index field 'alternatenames' [ReverseTermsIndex(size=6360)]: 77.9 KB
                  |-- term bytes [PagedBytes(blocksize=32768)]: 67.7 KB
                  |-- term addresses [MonotonicBlockPackedReader(blocksize=16384,size=6360,avgBPV=13)]: 10.2 KB
              |-- reverse index field 'asciiname' [ReverseTermsIndex(size=5165)]: 60.1 KB
                  |-- term bytes [PagedBytes(blocksize=32768)]: 53 KB
                  |-- term addresses [MonotonicBlockPackedReader(blocksize=16384,size=5165,avgBPV=11)]: 7 KB
              |-- reverse index field 'name' [ReverseTermsIndex(size=5235)]: 61.2 KB
                  |-- term bytes [PagedBytes(blocksize=32768)]: 54.1 KB
                  |-- term addresses [MonotonicBlockPackedReader(blocksize=16384,size=5235,avgBPV=11)]: 7.1 KB
      |-- stored fields [CompressingStoredFieldsReader(mode=FAST,chunksize=16384)]: 216.3 KB
          |-- stored field index [CompressingStoredFieldsIndexReader(blocks=65)]: 216.3 KB
              |-- doc base deltas: 55.8 KB
              |-- start pointer deltas: 158.9 KB
      |-- term vectors [CompressingTermVectorsReader(mode=FAST,chunksize=4096)]: 224 KB
          |-- term vector index [CompressingStoredFieldsIndexReader(blocks=67)]: 224 KB
              |-- doc base deltas: 65.6 KB
              |-- start pointer deltas: 156.8 KB
      

      Note this works for any accountable, so also e.g. NRTCachingDirectory, OrdinalMap, Suggesters, FSTs, and so on. You can also e.g. traverse the graph yourself and output whatever you want.

      To be safe, I define that the graph returned is "point in time snapshot" and free of race conditions, and the Accountable helper methods provide this and also prevent access (even via cast) to datastructures you shouldn't be able to get to, just provide information.

      Since we aren't on java 8 yet (and cannot provide a simple default method), instead I think we should just add the method to Accountable, but add default emptyList() implementations to impacted datastructures such as DocIDSet and Suggester. For codec APIs, these are lower level, and there I think its best to leave the method abstract since they should really be providing useful information.

      1. LUCENE-5949.patch
        240 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Patch. its somewhat large since it includes improved toString()'s everywhere in the codec api (which IMO is a good thing in general).

        Additionally I found some crabs (missing codec checks in old term vectors codec, broken hashing on fieldinfo with MemoryDV, etc) and fixed those here too.

        I added assertions to AssertingCodec and to TestUtil.checkXXX to ensure that toString() works, that the returned iterators are immutable, and that the implementations work.

        Show
        Robert Muir added a comment - Patch. its somewhat large since it includes improved toString()'s everywhere in the codec api (which IMO is a good thing in general). Additionally I found some crabs (missing codec checks in old term vectors codec, broken hashing on fieldinfo with MemoryDV, etc) and fixed those here too. I added assertions to AssertingCodec and to TestUtil.checkXXX to ensure that toString() works, that the returned iterators are immutable, and that the implementations work.
        Hide
        Michael McCandless added a comment -

        +1, this looks wonderful!

        Now there is no more mystery left when users are confused about what's using RAM in Lucene...

        Show
        Michael McCandless added a comment - +1, this looks wonderful! Now there is no more mystery left when users are confused about what's using RAM in Lucene...
        Hide
        Dawid Weiss added a comment -

        Very cool. I just needed it very recently and had to inspect stuff manually.

        Show
        Dawid Weiss added a comment - Very cool. I just needed it very recently and had to inspect stuff manually.
        Hide
        ASF subversion and git services added a comment -

        Commit 1625275 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1625275 ]

        LUCENE-5949: Add Accountable.getChildResources

        Show
        ASF subversion and git services added a comment - Commit 1625275 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1625275 ] LUCENE-5949 : Add Accountable.getChildResources
        Hide
        ASF subversion and git services added a comment -

        Commit 1625356 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1625356 ]

        LUCENE-5949: Add Accountable.getChildResources

        Show
        ASF subversion and git services added a comment - Commit 1625356 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1625356 ] LUCENE-5949 : Add Accountable.getChildResources
        Hide
        ASF subversion and git services added a comment -

        Commit 1625357 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1625357 ]

        LUCENE-5949: add addresses child to binary fieldcacheimpl, add some missing unmodifiable()

        Show
        ASF subversion and git services added a comment - Commit 1625357 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1625357 ] LUCENE-5949 : add addresses child to binary fieldcacheimpl, add some missing unmodifiable()
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development