Lucene - Core
  1. Lucene - Core
  2. LUCENE-3416

Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them.

      1. LUCENE-3416.patch
        1.0 kB
        Shay Banon
      2. LUCENE-3416.patch
        4 kB
        Shay Banon

        Activity

        Hide
        Simon Willnauer added a comment -

        Shay, can't you use a Input / Output wrapper on a RateLimitingDirectoryDelegate? With lucene 4.0 you get the IOContext when open / creating streams so you can decide based on this?

        Show
        Simon Willnauer added a comment - Shay, can't you use a Input / Output wrapper on a RateLimitingDirectoryDelegate? With lucene 4.0 you get the IOContext when open / creating streams so you can decide based on this?
        Hide
        Shay Banon added a comment -

        It is possible, but requires more work to do, and depends on overriding the createOutput method (as well as all the other methods in Directory). If rate limiting makes sense on the directory level to be exposed as a "feature", I think that this small change allows for greater control over it.

        Show
        Shay Banon added a comment - It is possible, but requires more work to do, and depends on overriding the createOutput method (as well as all the other methods in Directory). If rate limiting makes sense on the directory level to be exposed as a "feature", I think that this small change allows for greater control over it.
        Hide
        Simon Willnauer added a comment -

        I see, I guess that is kind of overkill here. This patch looks fine to me though while I wonder why this needs to be synchronized since we don't read it from a synced block. if you want this to take immediate effect you should rather use volatile here? I doubt that this is necessary in this context - I'd rather not invalidate a cache line for each IndexOutput creation.

        Show
        Simon Willnauer added a comment - I see, I guess that is kind of overkill here. This patch looks fine to me though while I wonder why this needs to be synchronized since we don't read it from a synced block. if you want this to take immediate effect you should rather use volatile here? I doubt that this is necessary in this context - I'd rather not invalidate a cache line for each IndexOutput creation.
        Hide
        Shay Banon added a comment -

        The only reason its synchronized is because the setMaxMergeWriteMBPerSec method is synchronized (I guess to protected from setting the rate limit concurrently). In practice, I don't see users changing it that often, so concerns about cache lines are not really relevant.

        Show
        Shay Banon added a comment - The only reason its synchronized is because the setMaxMergeWriteMBPerSec method is synchronized (I guess to protected from setting the rate limit concurrently). In practice, I don't see users changing it that often, so concerns about cache lines are not really relevant.
        Hide
        Simon Willnauer added a comment -

        The only reason its synchronized is because the setMaxMergeWriteMBPerSec method is synchronized (I guess to protected from setting the rate limit concurrently). In practice, I don't see users changing it that often, so concerns about cache lines are not really relevant.

        this make no sense to me. If you don't want to set this concurrently how does a lock protect you from this? I mean you if you have two threads accessing this you have either A B or B A. but this would happen without a lock too. if you want to have the changes to take effect immediately you need to either lock on each read on this var or make it volatile which is almost equivalent (a mem barrier).

        My concern here was related to make this var volatile which would be a cacheline invalidation each time you read the var. I think we should get rid of the synchronized.

        Show
        Simon Willnauer added a comment - The only reason its synchronized is because the setMaxMergeWriteMBPerSec method is synchronized (I guess to protected from setting the rate limit concurrently). In practice, I don't see users changing it that often, so concerns about cache lines are not really relevant. this make no sense to me. If you don't want to set this concurrently how does a lock protect you from this? I mean you if you have two threads accessing this you have either A B or B A. but this would happen without a lock too. if you want to have the changes to take effect immediately you need to either lock on each read on this var or make it volatile which is almost equivalent (a mem barrier). My concern here was related to make this var volatile which would be a cacheline invalidation each time you read the var. I think we should get rid of the synchronized.
        Hide
        Shay Banon added a comment -

        > this make no sense to me. If you don't want to set this concurrently how does a lock protect you from this? I mean you if you have two threads accessing this you have either A B or B A. but this would happen without a lock too. if you want to have the changes to take effect immediately you need to either lock on each read on this var or make it volatile which is almost equivalent (a mem barrier).

        No, thats not correct. setMaxMergeWriteMBPerSec (not the method I added, the other one) is a complex method, and I think Mike wanted to protect from two threads setting the value concurrently. As for reading the value, I think Mike logic was that its not that importnat the have "immediate" visibility of the change to require a volatile field (which is understandable). So, since setMaxMergeWriteMBPerSec is synchronized, the method added in this patch has to be as well.

        > My concern here was related to make this var volatile which would be a cacheline invalidation each time you read the var. I think we should get rid of the synchronized.

        Reading a volatile var in x86 is not a cache invalidation, though it does come with a cost. Its not relevant here based on what I explained before (and second guessing Mike )

        Show
        Shay Banon added a comment - > this make no sense to me. If you don't want to set this concurrently how does a lock protect you from this? I mean you if you have two threads accessing this you have either A B or B A. but this would happen without a lock too. if you want to have the changes to take effect immediately you need to either lock on each read on this var or make it volatile which is almost equivalent (a mem barrier). No, thats not correct. setMaxMergeWriteMBPerSec (not the method I added, the other one) is a complex method, and I think Mike wanted to protect from two threads setting the value concurrently. As for reading the value, I think Mike logic was that its not that importnat the have "immediate" visibility of the change to require a volatile field (which is understandable). So, since setMaxMergeWriteMBPerSec is synchronized, the method added in this patch has to be as well. > My concern here was related to make this var volatile which would be a cacheline invalidation each time you read the var. I think we should get rid of the synchronized. Reading a volatile var in x86 is not a cache invalidation, though it does come with a cost. Its not relevant here based on what I explained before (and second guessing Mike )
        Hide
        Simon Willnauer added a comment -

        Reading a volatile var in x86 is not a cache invalidation, though it does come with a cost.

        yes unless you are reading and writing it. Yet since this is not intended here we don't need to get into details. Bottom line that synchronization doesn't make much sense here. we should try to control some order here that we can not control. if somebody sets either of those concurrently they should take care of synchronization. What should I expect if you set this concurrently? IMO we should craft those method to work without synchronization which is certainly possible. synchronized here buys us nothing and should be removed.

        Show
        Simon Willnauer added a comment - Reading a volatile var in x86 is not a cache invalidation, though it does come with a cost. yes unless you are reading and writing it. Yet since this is not intended here we don't need to get into details. Bottom line that synchronization doesn't make much sense here. we should try to control some order here that we can not control. if somebody sets either of those concurrently they should take care of synchronization. What should I expect if you set this concurrently? IMO we should craft those method to work without synchronization which is certainly possible. synchronized here buys us nothing and should be removed.
        Hide
        Michael McCandless added a comment -

        I made setMaxMergeWriteMBPerSec sync'd because ie has tricky logic about creating a new RateLimiter or reusing an existing one (so current merges can see the change on a "best effort" basis) or clearing the current one... so I just wanted to be safe and have only one thread doing this at once; else eg you could get NPE I think.

        But, probably we should in fact make mergeWriteRateLimiter volatile, just to make sure the unsync'd read sees the current value? The minor added read cost should be negligible vs the overhead of creating / writing / closing files.

        Show
        Michael McCandless added a comment - I made setMaxMergeWriteMBPerSec sync'd because ie has tricky logic about creating a new RateLimiter or reusing an existing one (so current merges can see the change on a "best effort" basis) or clearing the current one... so I just wanted to be safe and have only one thread doing this at once; else eg you could get NPE I think. But, probably we should in fact make mergeWriteRateLimiter volatile, just to make sure the unsync'd read sees the current value? The minor added read cost should be negligible vs the overhead of creating / writing / closing files.
        Hide
        Shay Banon added a comment -

        I agree with Mike, I think it should remain synchronized, it does safeguard concurrently calling setMaxMergeWriteMBPerSec from falling over itself (who "wins" the call is not really relevant). Since thats synchronized, the metod I added should be as well. Personally, I really don't think there is a need to make it thread safe without "blocking", since calling the "setters" is not something people do frequently at all, so the optimization is mute, and it will complicate the code.

        As for making mergeWriteRateLimiter volatile, it can be done. Though, in practice, there really is no need to do it (there is a memory barrier when reading it before). But, I think that should go in a different issue? Just to keep changes clean and isolated?

        Show
        Shay Banon added a comment - I agree with Mike, I think it should remain synchronized, it does safeguard concurrently calling setMaxMergeWriteMBPerSec from falling over itself (who "wins" the call is not really relevant). Since thats synchronized, the metod I added should be as well. Personally, I really don't think there is a need to make it thread safe without "blocking", since calling the "setters" is not something people do frequently at all, so the optimization is mute, and it will complicate the code. As for making mergeWriteRateLimiter volatile, it can be done. Though, in practice, there really is no need to do it (there is a memory barrier when reading it before). But, I think that should go in a different issue? Just to keep changes clean and isolated?
        Hide
        Simon Willnauer added a comment -

        what about something like that:

          public void setMaxMergeWriteMBPerSec(Double mbPerSec) {
            final RateLimiter limiter = mergeWriteRateLimiter;
            if (mbPerSec == null) {
              if (limiter != null) {
                limiter.setMaxRate(Double.MAX_VALUE);
                mergeWriteRateLimiter = null;
              }
            } else if (limiter != null) {
              limiter.setMaxRate(mbPerSec);
            } else {
              mergeWriteRateLimiter = new RateLimiter(mbPerSec);
            }
          }
        
          /** See {@link #setMaxMergeWriteMBPerSec}.
           *
           * @lucene.experimental */
          public Double getMaxMergeWriteMBPerSec() {
            final RateLimiter limiter = mergeWriteRateLimiter;
            return limiter == null ? null : limiter.getMaxRateMB();
          }
        

        I think we can make this simple without any synchronization which is totally unnecessary here.

        Show
        Simon Willnauer added a comment - what about something like that: public void setMaxMergeWriteMBPerSec( Double mbPerSec) { final RateLimiter limiter = mergeWriteRateLimiter; if (mbPerSec == null ) { if (limiter != null ) { limiter.setMaxRate( Double .MAX_VALUE); mergeWriteRateLimiter = null ; } } else if (limiter != null ) { limiter.setMaxRate(mbPerSec); } else { mergeWriteRateLimiter = new RateLimiter(mbPerSec); } } /** See {@link #setMaxMergeWriteMBPerSec}. * * @lucene.experimental */ public Double getMaxMergeWriteMBPerSec() { final RateLimiter limiter = mergeWriteRateLimiter; return limiter == null ? null : limiter.getMaxRateMB(); } I think we can make this simple without any synchronization which is totally unnecessary here.
        Hide
        Michael McCandless added a comment -

        That change looks fine to me Simon.

        Show
        Michael McCandless added a comment - That change looks fine to me Simon.
        Hide
        Shay Banon added a comment -

        I must say that I am at a lost in trying to understand why we need this "optimization", but it does not really matter to me as long as the ability to set the rate limiter instance gets in.

        Show
        Shay Banon added a comment - I must say that I am at a lost in trying to understand why we need this "optimization", but it does not really matter to me as long as the ability to set the rate limiter instance gets in.
        Hide
        Shay Banon added a comment -

        A new patch, remove synchronization. It also adds another field to RateLimiter to record the original mbPerSec value set, so we can easily get it back.

        Show
        Shay Banon added a comment - A new patch, remove synchronization. It also adds another field to RateLimiter to record the original mbPerSec value set, so we can easily get it back.
        Hide
        Simon Willnauer added a comment -

        that looks good to me, thanks shay!

        Show
        Simon Willnauer added a comment - that looks good to me, thanks shay!
        Hide
        Simon Willnauer added a comment -

        I am planning to commit this today, any objections?

        Show
        Simon Willnauer added a comment - I am planning to commit this today, any objections?
        Hide
        Simon Willnauer added a comment -

        committed to trunk. Thanks Shay

        Show
        Simon Willnauer added a comment - committed to trunk. Thanks Shay

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Shay Banon
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development