Lucene - Core
  1. Lucene - Core
  2. LUCENE-4537

Move RateLimiter up to Directory and make it IOContext aware

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Currently the RateLimiter only applies to FSDirectory which is fine in general but always requires casts and other dir. impls (custom ones could benefit from this too.) We are also only able to rate limit merge operations which limits the functionality here a lot. Since we have the context information what the IndexOutput is used for we can use that for rate limiting.

      1. LUCENE-4537.patch
        25 kB
        Simon Willnauer
      2. LUCENE-4537.patch
        15 kB
        Simon Willnauer
      3. LUCENE-4537.patch
        17 kB
        Simon Willnauer
      4. LUCENE-4537.patch
        16 kB
        Simon Willnauer

        Activity

        Hide
        Simon Willnauer added a comment -

        here is a patch

        Show
        Simon Willnauer added a comment - here is a patch
        Hide
        Robert Muir added a comment -

        you can't put nulls into concurrenthashmaps.

        Show
        Robert Muir added a comment - you can't put nulls into concurrenthashmaps.
        Hide
        Robert Muir added a comment -

        Lets make it a hashmap: CHM is entirely wasteful here (on space etc).

        There is no need for concurrency: its a setter.

        Show
        Robert Muir added a comment - Lets make it a hashmap: CHM is entirely wasteful here (on space etc). There is no need for concurrency: its a setter.
        Hide
        Robert Muir added a comment -

        Actually I don't think we should do this: here's my line of reasoning.

        Instead of dictacting data structures and maps in this abstract class,
        really, we should just have abstract methods.

        But we already have these! they are createOutput/openInput!

        Anything else is unnecessary bloat here.

        Show
        Robert Muir added a comment - Actually I don't think we should do this: here's my line of reasoning. Instead of dictacting data structures and maps in this abstract class, really, we should just have abstract methods. But we already have these! they are createOutput/openInput! Anything else is unnecessary bloat here.
        Hide
        Simon Willnauer added a comment -

        new patch simplifying this datastructure to a straight array indexed by Context#ordinal(). no real overhead to what we had before.

        Show
        Simon Willnauer added a comment - new patch simplifying this datastructure to a straight array indexed by Context#ordinal(). no real overhead to what we had before.
        Hide
        Simon Willnauer added a comment -

        But we already have these! they are createOutput/openInput!

        you mean for each directory I want this in I need to subclass the dir an pass it in? That doesn't seem reasonable. Another way would be to have this per-context stuff in the RateLimiter though which I don't like that much to be honest.

        Show
        Simon Willnauer added a comment - But we already have these! they are createOutput/openInput! you mean for each directory I want this in I need to subclass the dir an pass it in? That doesn't seem reasonable. Another way would be to have this per-context stuff in the RateLimiter though which I don't like that much to be honest.
        Hide
        Robert Muir added a comment -

        Its absolutely reasonable. Otherwise the APIs are broken here.

        putting this concrete map on an abstract class that does nothing with it: thats not going to happen here.

        Lets fix the real bug: that rate limiting isnt just wrapping with a different IO impl! Then the original
        directory factory apis work fine with it: if you want a concurrent map you can manage this all yourself.

        If we want a sugar "RateLimitingDirectoryWrapper" thats fine. But its bogus this shit is in FSDir today.

        Show
        Robert Muir added a comment - Its absolutely reasonable. Otherwise the APIs are broken here. putting this concrete map on an abstract class that does nothing with it: thats not going to happen here. Lets fix the real bug: that rate limiting isnt just wrapping with a different IO impl! Then the original directory factory apis work fine with it: if you want a concurrent map you can manage this all yourself. If we want a sugar "RateLimitingDirectoryWrapper" thats fine. But its bogus this shit is in FSDir today.
        Hide
        Simon Willnauer added a comment - - edited

        here is another option just for the record. I removed all public API of RateLimiter from Directory & FSDirectory and made an overloaded createOutput method that can be called from a wrapper dir. I mean this is not 100% detached from dir but much cleaner than what we had before. I don't think we can simply have a rate limited IndexOutput since it will not play nice with BufferedIndexOutput. progress over perfection though...

        Show
        Simon Willnauer added a comment - - edited here is another option just for the record. I removed all public API of RateLimiter from Directory & FSDirectory and made an overloaded createOutput method that can be called from a wrapper dir. I mean this is not 100% detached from dir but much cleaner than what we had before. I don't think we can simply have a rate limited IndexOutput since it will not play nice with BufferedIndexOutput. progress over perfection though...
        Hide
        Robert Muir added a comment -

        I don't think ratelimiter should be in Directory at all: then its a bug if XYZdir doesnt implement it.

        Screw that: lets divorce this from directory completely as I suggested. Then it works with all directories,
        and they are totally unaware of it.

        I don't think we can simply have a rate limited IndexOutput since it will not play nice with BufferedIndexOutput.

        I think this is some premature optimization.

        APIs come first here!

        Show
        Robert Muir added a comment - I don't think ratelimiter should be in Directory at all: then its a bug if XYZdir doesnt implement it. Screw that: lets divorce this from directory completely as I suggested. Then it works with all directories, and they are totally unaware of it. I don't think we can simply have a rate limited IndexOutput since it will not play nice with BufferedIndexOutput. I think this is some premature optimization. APIs come first here!
        Hide
        Simon Willnauer added a comment -

        I think this is some premature optimization.

        I am not sure if that is premature. But I do agree it would be great if we could just wrap the IndexOutput to do this kind of stuff entirely outside of Directory. Maybe we can have a flush callback on BufferedIndexOutput we can hook into the flush call. This would also enable us to do some flush statistics which is independent of this issue. This could be an impl detail of BufferedIndexOutput but it would enable us to 1. do the opto we do today 2. divorce the rate limiting entirely from Dir.

        Show
        Simon Willnauer added a comment - I think this is some premature optimization. I am not sure if that is premature. But I do agree it would be great if we could just wrap the IndexOutput to do this kind of stuff entirely outside of Directory. Maybe we can have a flush callback on BufferedIndexOutput we can hook into the flush call. This would also enable us to do some flush statistics which is independent of this issue. This could be an impl detail of BufferedIndexOutput but it would enable us to 1. do the opto we do today 2. divorce the rate limiting entirely from Dir.
        Hide
        Michael McCandless added a comment -

        I think rate limiting merge IO is important functionality: merges
        easily kill search performance if you index/search on one box (NRT
        app).

        But I agree: Directory is abstract and minimal and we should keep it
        that way.

        A generic wrapper around any IO would be great ... but I'm not sure
        how we'd do it? EG, would we have to tally up our own bytes in every
        write method (writeInt/Long/VInt/VLong/etc.)? Maybe that's acceptable?
        It's only for writing ...

        Or maybe we only make RateLimitingBufferedIO subclass? Though I had
        wanted to try this with RAMDirectory too (playing with Zing)... I
        guess we could make a RateLimitingRAMOutputStream ...

        Show
        Michael McCandless added a comment - I think rate limiting merge IO is important functionality: merges easily kill search performance if you index/search on one box (NRT app). But I agree: Directory is abstract and minimal and we should keep it that way. A generic wrapper around any IO would be great ... but I'm not sure how we'd do it? EG, would we have to tally up our own bytes in every write method (writeInt/Long/VInt/VLong/etc.)? Maybe that's acceptable? It's only for writing ... Or maybe we only make RateLimitingBufferedIO subclass? Though I had wanted to try this with RAMDirectory too (playing with Zing)... I guess we could make a RateLimitingRAMOutputStream ...
        Hide
        Robert Muir added a comment -

        I didn't say it wasn't important.

        I guess, if its really important, then we'll invest the time to
        figure out clean APIs to support it. Otherwise we can remove it

        Show
        Robert Muir added a comment - I didn't say it wasn't important. I guess, if its really important, then we'll invest the time to figure out clean APIs to support it. Otherwise we can remove it
        Hide
        Simon Willnauer added a comment -

        here is a new patch detaching RateLimiting entirely from Directory and it's subclasses. RateLimitingDirectoryWrapper creates an IndexOutput wrapper if it knows of a Ratelimiter for the IOContext the IndexOutput is created for. RateLimitingIndexOutput subclasses BufferedIndexOutput and forwards / pauses on flush to the underlying IndexOutput. I also added an optimiztion for a Wrapped BufferedIndexOutput so FSDir uses will have the same experience as they had before just in a cleaner way API wise.

        Show
        Simon Willnauer added a comment - here is a new patch detaching RateLimiting entirely from Directory and it's subclasses. RateLimitingDirectoryWrapper creates an IndexOutput wrapper if it knows of a Ratelimiter for the IOContext the IndexOutput is created for. RateLimitingIndexOutput subclasses BufferedIndexOutput and forwards / pauses on flush to the underlying IndexOutput. I also added an optimiztion for a Wrapped BufferedIndexOutput so FSDir uses will have the same experience as they had before just in a cleaner way API wise.
        Hide
        Robert Muir added a comment -

        +1 to this approach!!!!

        Show
        Robert Muir added a comment - +1 to this approach!!!!
        Hide
        Michael McCandless added a comment -

        +1, this is a very clean solution!

        Thanks Simon.

        Show
        Michael McCandless added a comment - +1, this is a very clean solution! Thanks Simon.
        Hide
        Simon Willnauer added a comment -

        thanks guys!

        committed to trunk in rev. 1407268

        backported to 4x in rev. 1407271

        Show
        Simon Willnauer added a comment - thanks guys! committed to trunk in rev. 1407268 backported to 4x in rev. 1407271
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Simon Willnauer
        http://svn.apache.org/viewvc?view=revision&revision=1407288

        LUCENE-4537: Separate RateLimiter from Directory

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1407288 LUCENE-4537 : Separate RateLimiter from Directory
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Simon Willnauer
        http://svn.apache.org/viewvc?view=revision&revision=1407271

        LUCENE-4537: Separate RateLimiter from Directory

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1407271 LUCENE-4537 : Separate RateLimiter from Directory

          People

          • Assignee:
            Unassigned
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development