Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0, 4.1, 6.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Currently MemoryIndex uses BytesRef objects to represent terms and holds an int[] per term per field to represent postings. For highlighting this creates a ton of objects for each search that 1. need to be GCed and 2. can't be reused.

      1. LUCENE-4515.patch
        99 kB
        Simon Willnauer
      2. LUCENE-4515.patch
        98 kB
        Simon Willnauer
      3. LUCENE-4515.patch
        87 kB
        Simon Willnauer
      4. LUCENE-4515.patch
        63 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Simon Willnauer added a comment -

          I included a fix for LUCENE-3865 along the lines in the upcoming patch.

          Show
          Simon Willnauer added a comment - I included a fix for LUCENE-3865 along the lines in the upcoming patch.
          Hide
          Simon Willnauer added a comment -

          here is a patch that cuts over MemoryIndex to ByteBlockPool (store terms) / IntBlockPool (store postings) and using BytesRefHash to hold the term dictionary. This patch still has some edges and some nocommits but in general this is what I had in mind. I factored out IntBlockPool out of TermsHash / DWPT and made it more general or useable for slice writing reading. I haven't done any benchmarks yet but I could imagine its even faster than before due to better memory alignment.

          comments welcome.

          Show
          Simon Willnauer added a comment - here is a patch that cuts over MemoryIndex to ByteBlockPool (store terms) / IntBlockPool (store postings) and using BytesRefHash to hold the term dictionary. This patch still has some edges and some nocommits but in general this is what I had in mind. I factored out IntBlockPool out of TermsHash / DWPT and made it more general or useable for slice writing reading. I haven't done any benchmarks yet but I could imagine its even faster than before due to better memory alignment. comments welcome.
          Hide
          Robert Muir added a comment -

          Are we sure this is the right direction to go for MemoryIndex?

          I think its being abused for highlighting: but it has other real use cases and we shouldn't make it worse for its real use cases just because highlighting abuses it.

          instead I think if the concern is garbage then maybe just use term vectors and compute this stuff at index time.

          Show
          Robert Muir added a comment - Are we sure this is the right direction to go for MemoryIndex? I think its being abused for highlighting: but it has other real use cases and we shouldn't make it worse for its real use cases just because highlighting abuses it. instead I think if the concern is garbage then maybe just use term vectors and compute this stuff at index time.
          Hide
          Simon Willnauer added a comment -

          Are we sure this is the right direction to go for MemoryIndex?

          So we have a couple of options here. Like one would be to have a light weight DWPT (atomic writer however you wanna call it) but our IW has a pretty significant overhead for indexing just one document and execute a search on it so I think unless we have all those refactoring I want to do long term this class should be supported.

          I think its being abused for highlighting: but it has other real use cases and we shouldn't make it worse for its real use cases just because highlighting abuses it.

          I couldn't agree more though. Maybe I used this as a bad example. In ElasticSearch we use it for percolation (http://www.elasticsearch.org/guide/reference/java-api/percolate.html) and this works actually pretty well with the MemoryIndex. I had other usecases in the past where this was handy too though. I also see folks on the mailing list opening issues so unless we have a similar lightweight replacement I don't see why we should not improve this impl. The main reason why I improved this here is that we want to reuse the internal buffers and if possible move away from objects.

          instead I think if the concern is garbage then maybe just use term vectors and compute this stuff at index time

          this might work for highlighting I agree.

          Show
          Simon Willnauer added a comment - Are we sure this is the right direction to go for MemoryIndex? So we have a couple of options here. Like one would be to have a light weight DWPT (atomic writer however you wanna call it) but our IW has a pretty significant overhead for indexing just one document and execute a search on it so I think unless we have all those refactoring I want to do long term this class should be supported. I think its being abused for highlighting: but it has other real use cases and we shouldn't make it worse for its real use cases just because highlighting abuses it. I couldn't agree more though. Maybe I used this as a bad example. In ElasticSearch we use it for percolation ( http://www.elasticsearch.org/guide/reference/java-api/percolate.html ) and this works actually pretty well with the MemoryIndex. I had other usecases in the past where this was handy too though. I also see folks on the mailing list opening issues so unless we have a similar lightweight replacement I don't see why we should not improve this impl. The main reason why I improved this here is that we want to reuse the internal buffers and if possible move away from objects. instead I think if the concern is garbage then maybe just use term vectors and compute this stuff at index time this might work for highlighting I agree.
          Hide
          Alan Woodward added a comment -

          We use MemoryIndex for monitoring, very similar to the ES percolator, running lots of queries over individual documents in a queue. We've occasionally had GC issues with this, so anything that makes the garbage collector happier is good for us.

          Show
          Alan Woodward added a comment - We use MemoryIndex for monitoring, very similar to the ES percolator, running lots of queries over individual documents in a queue. We've occasionally had GC issues with this, so anything that makes the garbage collector happier is good for us.
          Hide
          Michael McCandless added a comment -

          Cool!: you used the same slice idea that we use to hold postings in
          RAM in shared byte[]s, but with int[]s instead. This should be a huge
          reduction on GC load for MemoryIndex.

          I agree that DocFieldProcessor.docBoost is unused...

          synchronizedAllocator looks unused? I guess you added that after
          removing all sync from RecyclingByteBlockAllocator ... but I think we
          can just add synchronizedAllocator back later if/when we need it?
          Separately can you call out that RecyclingByteBlockAllocator is not
          thread safe in its javadocs?

          int[] start; // nocommit maybe we can safe the end array and just check freq - need to change the SliceReader for this

          I think you need the start ... because if you used more than one slice
          you won't know how to read "backwards" to get to the starting slice?

          intBlockPool = new IntBlockPool(); // nocommit expose allocator and impl a recycling one

          If we do that we have to make sure that allocator clears each int[]
          before returning it, in getIntBlock().

          The added MemoryIndex.reset method is sort of ... spooky? Like, do we
          really need/want to reuse a MemoryIndex? (I guess this is because we
          added passing in an allocator to the ctor ... so you want the byte[]'s
          returned to it ... but that also makes me nervous: should we really
          pass in an external allocator...?).

          Show
          Michael McCandless added a comment - Cool!: you used the same slice idea that we use to hold postings in RAM in shared byte[]s, but with int[]s instead. This should be a huge reduction on GC load for MemoryIndex. I agree that DocFieldProcessor.docBoost is unused... synchronizedAllocator looks unused? I guess you added that after removing all sync from RecyclingByteBlockAllocator ... but I think we can just add synchronizedAllocator back later if/when we need it? Separately can you call out that RecyclingByteBlockAllocator is not thread safe in its javadocs? int[] start; // nocommit maybe we can safe the end array and just check freq - need to change the SliceReader for this I think you need the start ... because if you used more than one slice you won't know how to read "backwards" to get to the starting slice? intBlockPool = new IntBlockPool(); // nocommit expose allocator and impl a recycling one If we do that we have to make sure that allocator clears each int[] before returning it, in getIntBlock(). The added MemoryIndex.reset method is sort of ... spooky? Like, do we really need/want to reuse a MemoryIndex? (I guess this is because we added passing in an allocator to the ctor ... so you want the byte[]'s returned to it ... but that also makes me nervous: should we really pass in an external allocator...?).
          Hide
          Robert Muir added a comment -

          I had other usecases in the past where this was handy too though. I also see folks on the mailing list opening issues so unless we have a similar lightweight replacement I don't see why we should not improve this impl.

          I think we can separate improving the internal implementation (which is one thing), from passing allocators/byteblockpools into ctors.

          To me, this is not a good API decision because its essentially malloc() and free(). I don't think we should do it.

          Show
          Robert Muir added a comment - I had other usecases in the past where this was handy too though. I also see folks on the mailing list opening issues so unless we have a similar lightweight replacement I don't see why we should not improve this impl. I think we can separate improving the internal implementation (which is one thing), from passing allocators/byteblockpools into ctors. To me, this is not a good API decision because its essentially malloc() and free(). I don't think we should do it.
          Hide
          Simon Willnauer added a comment -

          synchronizedAllocator looks unused? I guess you added that after
          removing all sync from RecyclingByteBlockAllocator ... but I think we
          can just add synchronizedAllocator back later if/when we need it?
          Separately can you call out that RecyclingByteBlockAllocator is not
          thread safe in its javadocs?

          regarding javadocs, I though I did this... will fix. Regardin sync. yeah lets drop it we can still add if needed, trivial though!

          I think you need the start ... because if you used more than one slice
          you won't know how to read "backwards" to get to the starting slice?

          the comment is on start but it says "end" I think given the fact that we know the freq we can read the slice without storing the end but we'd need to change SliceReader for it and I am not sure if that is worth the trouble we could get in. Yet, 4byte per term though.

          If we do that we have to make sure that allocator clears each int[]
          before returning it, in getIntBlock().

          we really rely on this in ByteBlockPool already so which likely doesn't work at this time but we don't run into since we don't reuse in DWPT? I will add a test.

          The added MemoryIndex.reset method is sort of ... spooky? Like, do we
          really need/want to reuse a MemoryIndex? (I guess this is because we
          added passing in an allocator to the ctor ... so you want the byte[]'s
          returned to it ... but that also makes me nervous: should we really
          pass in an external allocator...?).

          I think reuse is a special usecase and I guess we should allow it. Yet, I totally agree this is risky. I suggest to make this possible if you subclass and expose this stuff via protected API so if you really really wanna do this you can if you subclass?

          Show
          Simon Willnauer added a comment - synchronizedAllocator looks unused? I guess you added that after removing all sync from RecyclingByteBlockAllocator ... but I think we can just add synchronizedAllocator back later if/when we need it? Separately can you call out that RecyclingByteBlockAllocator is not thread safe in its javadocs? regarding javadocs, I though I did this... will fix. Regardin sync. yeah lets drop it we can still add if needed, trivial though! I think you need the start ... because if you used more than one slice you won't know how to read "backwards" to get to the starting slice? the comment is on start but it says "end" I think given the fact that we know the freq we can read the slice without storing the end but we'd need to change SliceReader for it and I am not sure if that is worth the trouble we could get in. Yet, 4byte per term though. If we do that we have to make sure that allocator clears each int[] before returning it, in getIntBlock(). we really rely on this in ByteBlockPool already so which likely doesn't work at this time but we don't run into since we don't reuse in DWPT? I will add a test. The added MemoryIndex.reset method is sort of ... spooky? Like, do we really need/want to reuse a MemoryIndex? (I guess this is because we added passing in an allocator to the ctor ... so you want the byte[]'s returned to it ... but that also makes me nervous: should we really pass in an external allocator...?). I think reuse is a special usecase and I guess we should allow it. Yet, I totally agree this is risky. I suggest to make this possible if you subclass and expose this stuff via protected API so if you really really wanna do this you can if you subclass?
          Hide
          Michael McCandless added a comment -

          the comment is on start but it says "end" I think given the fact that we know the freq we can read the slice without storing the end but we'd need to change SliceReader for it and I am not sure if that is worth the trouble we could get in. Yet, 4byte per term though.

          Ahh I see, right! It's not needed. You do need the "end" per term as you build up the slices, but once done you can rely entirely on freq.

          we really rely on this in ByteBlockPool already so which likely doesn't work at this time but we don't run into since we don't reuse in DWPT? I will add a test.

          Hmm if we never reuse in DWPT then we don't need to clear...

          I think reuse is a special usecase and I guess we should allow it. Yet, I totally agree this is risky. I suggest to make this possible if you subclass and expose this stuff via protected API so if you really really wanna do this you can if you subclass?

          I think if we remove reset(), and then have protected ctor that can pass in the allocator ... maybe that's OK? Still makes me nervous ... we should mark that ctor experimental ...

          Show
          Michael McCandless added a comment - the comment is on start but it says "end" I think given the fact that we know the freq we can read the slice without storing the end but we'd need to change SliceReader for it and I am not sure if that is worth the trouble we could get in. Yet, 4byte per term though. Ahh I see, right! It's not needed. You do need the "end" per term as you build up the slices, but once done you can rely entirely on freq. we really rely on this in ByteBlockPool already so which likely doesn't work at this time but we don't run into since we don't reuse in DWPT? I will add a test. Hmm if we never reuse in DWPT then we don't need to clear... I think reuse is a special usecase and I guess we should allow it. Yet, I totally agree this is risky. I suggest to make this possible if you subclass and expose this stuff via protected API so if you really really wanna do this you can if you subclass? I think if we remove reset(), and then have protected ctor that can pass in the allocator ... maybe that's OK? Still makes me nervous ... we should mark that ctor experimental ...
          Hide
          Simon Willnauer added a comment -

          here is a new patch removing all the nocommits (except of the one that says I should move IntBlockPool to o.a.l.util - I will do that before I commit).

          I added more tests. Cleaned up *BlockPool and added javadocs to several places.

          MemoryIndex now only allows to reset & specify your own allocator if you are in the same package. If you really need to do this you now can but its not public API neither guaranteed in terms of BWCompat (it's module anyway). I think this is a fair compromise here.

          Show
          Simon Willnauer added a comment - here is a new patch removing all the nocommits (except of the one that says I should move IntBlockPool to o.a.l.util - I will do that before I commit). I added more tests. Cleaned up *BlockPool and added javadocs to several places. MemoryIndex now only allows to reset & specify your own allocator if you are in the same package. If you really need to do this you now can but its not public API neither guaranteed in terms of BWCompat (it's module anyway). I think this is a fair compromise here.
          Hide
          Robert Muir added a comment -

          we should mark that ctor experimental ...

          It should definitely be experimental. Tomorrow we might decide its cleaner to encode this with RAMFile or something.

          Show
          Robert Muir added a comment - we should mark that ctor experimental ... It should definitely be experimental. Tomorrow we might decide its cleaner to encode this with RAMFile or something.
          Hide
          Simon Willnauer added a comment -

          It should definitely be experimental. Tomorrow we might decide its cleaner to encode this with RAMFile or something.

          wait it's package private not protected so no need to annotate this really. We can change as we want.

          Show
          Simon Willnauer added a comment - It should definitely be experimental. Tomorrow we might decide its cleaner to encode this with RAMFile or something. wait it's package private not protected so no need to annotate this really. We can change as we want.
          Hide
          Michael McCandless added a comment -

          What makes me nervous is the precedent of passing allocators to APIs of our public classes (even if the API is package private) ...

          Maybe we should fork off a separate issue to discuss the reuse / allocators? The rest of this patch looks awesome

          Show
          Michael McCandless added a comment - What makes me nervous is the precedent of passing allocators to APIs of our public classes (even if the API is package private) ... Maybe we should fork off a separate issue to discuss the reuse / allocators? The rest of this patch looks awesome
          Hide
          Robert Muir added a comment -

          I have two concerns:
          1. the passing of allocators to public classes (Mike's concern). Why cant we just have a method like reset()? This is way cleaner, the datastructure is no longer exposed.
          2. the whole idea that this is "expert". e.g. highlighting isnt fast unless you use crazy apis that only ES/Solr use. This just makes lucene less useable as a whole. Instead of hacking around the highlighter here with BlockPoolAllocators: we should fix the highlighter to work well by default. e.g. it should have a sugar method like highlight(TopDocs) that uses MemoryIndex.reset() behind the scenes. This would be a simple and easy method for anyone to use.

          Show
          Robert Muir added a comment - I have two concerns: 1. the passing of allocators to public classes (Mike's concern). Why cant we just have a method like reset()? This is way cleaner, the datastructure is no longer exposed. 2. the whole idea that this is "expert". e.g. highlighting isnt fast unless you use crazy apis that only ES/Solr use. This just makes lucene less useable as a whole. Instead of hacking around the highlighter here with BlockPoolAllocators: we should fix the highlighter to work well by default. e.g. it should have a sugar method like highlight(TopDocs) that uses MemoryIndex.reset() behind the scenes. This would be a simple and easy method for anyone to use.
          Hide
          Simon Willnauer added a comment -

          ok I will try to summarize this again. I have 2 issues.

          1. reduce the GC load on MemIndex if it is used extensively - that is why I moved to ByteBlockPool / IntBlockPool
          2. control memory consumption / buffering in a very flexible way. When buffers are reused and how many of them.

          I don't give a fuck about highlighting at this point. I want to solve the issue with mem index that exists today and if rob wants to solve highlighing ok fine go for it but that is unrelated. I already removed all the changes to WeightedSpanTermExtractor in the latest patch. I really really don't see the big problems here to allow expert users to have more control over it. This patch fixes the 1. problem for everybody and allows folks with problem 2. to make use of allocators yet with lots of effort (subclass + same package). This makes robs 2. concern invalid really.

          Show
          Simon Willnauer added a comment - ok I will try to summarize this again. I have 2 issues. 1. reduce the GC load on MemIndex if it is used extensively - that is why I moved to ByteBlockPool / IntBlockPool 2. control memory consumption / buffering in a very flexible way. When buffers are reused and how many of them. I don't give a fuck about highlighting at this point. I want to solve the issue with mem index that exists today and if rob wants to solve highlighing ok fine go for it but that is unrelated. I already removed all the changes to WeightedSpanTermExtractor in the latest patch. I really really don't see the big problems here to allow expert users to have more control over it. This patch fixes the 1. problem for everybody and allows folks with problem 2. to make use of allocators yet with lots of effort (subclass + same package). This makes robs 2. concern invalid really.
          Hide
          Michael McCandless added a comment -

          1. reduce the GC load on MemIndex if it is used extensively - that is why I moved to ByteBlockPool / IntBlockPool
          2. control memory consumption / buffering in a very flexible way. When buffers are reused and how many of them.

          I think the current patch, minus passing allocator to MemoryIndex, solves #1 very well?

          I think we should open a new Jira issue for #2 since it's apparently somewhat controversial. In that issue I'd really like to understand "the memory consumption / buffering in a very flexible way / when buffers are reused and how many of them" use cases. If it's really an accounting issue (knowing how many bytes a given MemoryIndex is consuming) maybe we can expose that in other ways ... eg .sizeInBytes() method, or app passes in a Counter that MemoryIndex updates w/ how much RAM it thinks it's using, or ... something else?

          Show
          Michael McCandless added a comment - 1. reduce the GC load on MemIndex if it is used extensively - that is why I moved to ByteBlockPool / IntBlockPool 2. control memory consumption / buffering in a very flexible way. When buffers are reused and how many of them. I think the current patch, minus passing allocator to MemoryIndex, solves #1 very well? I think we should open a new Jira issue for #2 since it's apparently somewhat controversial. In that issue I'd really like to understand "the memory consumption / buffering in a very flexible way / when buffers are reused and how many of them" use cases. If it's really an accounting issue (knowing how many bytes a given MemoryIndex is consuming) maybe we can expose that in other ways ... eg .sizeInBytes() method, or app passes in a Counter that MemoryIndex updates w/ how much RAM it thinks it's using, or ... something else?
          Hide
          Simon Willnauer added a comment -

          alright guys. Here is a new patch that accepts a long instead of the allocators indicating an upper memory bound for reusing buffers in the pools if reset is called. I think this should make everybody happy and we can move forward here! lemme know what you think!

          Show
          Simon Willnauer added a comment - alright guys. Here is a new patch that accepts a long instead of the allocators indicating an upper memory bound for reusing buffers in the pools if reset is called. I think this should make everybody happy and we can move forward here! lemme know what you think!
          Hide
          Michael McCandless added a comment -

          Maybe rename "maxBufferedBytes" to "maxReusedBytes"? (Because it says how many bytes can be retained even after reset is called).

          I still don't like bringing allocation issues (even a long maxReusedBytes) into our APIs: this is Java. But since this is package private I guess it's OK.

          Can we default it to 0 not 5 MB? I think on reset() it's very unexpected that this class would still hold onto 5 MB RAM ...

          Why does reset() pass true for reuseFirst...?

          Show
          Michael McCandless added a comment - Maybe rename "maxBufferedBytes" to "maxReusedBytes"? (Because it says how many bytes can be retained even after reset is called). I still don't like bringing allocation issues (even a long maxReusedBytes) into our APIs: this is Java. But since this is package private I guess it's OK. Can we default it to 0 not 5 MB? I think on reset() it's very unexpected that this class would still hold onto 5 MB RAM ... Why does reset() pass true for reuseFirst...?
          Hide
          Simon Willnauer added a comment -

          new patch

          • added tests for norms
          • renamed the buffer parameter
          • default the reuse buffer to 0
          • set reuseFirst = false - not needed (missed that from prev. patch)

          I think its ready - I will commit in a few hours.

          Show
          Simon Willnauer added a comment - new patch added tests for norms renamed the buffer parameter default the reuse buffer to 0 set reuseFirst = false - not needed (missed that from prev. patch) I think its ready - I will commit in a few hours.
          Hide
          Michael McCandless added a comment -

          +1, thanks Simon!

          Show
          Michael McCandless added a comment - +1, thanks Simon!
          Hide
          Simon Willnauer added a comment -

          committed to trunk in rev 1404946
          backported to 4x in rev 1404955

          thanks guys

          Show
          Simon Willnauer added a comment - committed to trunk in rev 1404946 backported to 4x in rev 1404955 thanks guys
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Simon Willnauer
          http://svn.apache.org/viewvc?view=revision&revision=1404955

          LUCENE-4515: Make MemoryIndex more memory efficient

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1404955 LUCENE-4515 : Make MemoryIndex more memory efficient

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development