Lucene - Core
  1. Lucene - Core
  2. LUCENE-1909

Make IndexReader.DEFAULT_TERMS_INDEX_DIVISOR public

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Trivial Trivial
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New
    1. LUCENE_1909.patch
      1 kB
      Simon Willnauer

      Activity

      Hide
      Simon Willnauer added a comment -

      Grant, are you going to change this for 3.0? If not we should move it out to 3.1. If we make it public the constant should be final though. I attached a patch.

      thanks, simon

      Show
      Simon Willnauer added a comment - Grant, are you going to change this for 3.0? If not we should move it out to 3.1. If we make it public the constant should be final though. I attached a patch. thanks, simon
      Hide
      Simon Willnauer added a comment -

      patch with Javadoc

      Show
      Simon Willnauer added a comment - patch with Javadoc
      Hide
      Uwe Schindler added a comment -

      I think, we should have this in 3.0, it's not a big deal.

      Show
      Uwe Schindler added a comment - I think, we should have this in 3.0, it's not a big deal.
      Hide
      Michael McCandless added a comment -

      +1 for 3.0

      Show
      Michael McCandless added a comment - +1 for 3.0
      Hide
      Simon Willnauer added a comment -

      @Grant: can somebody else take this?

      Show
      Simon Willnauer added a comment - @Grant: can somebody else take this?
      Hide
      Uwe Schindler added a comment -

      Simon: Should I take & commit it?

      Show
      Uwe Schindler added a comment - Simon: Should I take & commit it?
      Hide
      Mark Miller added a comment -

      umm ... call me crazy, but why are we making this public?

      Show
      Mark Miller added a comment - umm ... call me crazy, but why are we making this public?
      Hide
      Grant Ingersoll added a comment -

      Because it's nice to know what the default is and be able to refer to that w/o having to hardcode a magic number somewhere else in your app when it is already hardcoded here.

      Show
      Grant Ingersoll added a comment - Because it's nice to know what the default is and be able to refer to that w/o having to hardcode a magic number somewhere else in your app when it is already hardcoded here.
      Hide
      Uwe Schindler added a comment -

      +1, I have the same opinion

      Show
      Uwe Schindler added a comment - +1, I have the same opinion
      Hide
      Mark Miller added a comment -

      Whats the use case?

      If if the default was different than 1, I'd still think this was weird - but ...

      But the default is 1 - load them all. Seems weird to me to broadcast this beyond documentation. It locks us into that field and I can't see the benefit myself. You have a specific use case that this makes sense for?

      Show
      Mark Miller added a comment - Whats the use case? If if the default was different than 1, I'd still think this was weird - but ... But the default is 1 - load them all. Seems weird to me to broadcast this beyond documentation. It locks us into that field and I can't see the benefit myself. You have a specific use case that this makes sense for?
      Hide
      Mark Miller added a comment -

      Okay - 2v1 I guess - but I think you guys have been eating too many of the crazy pills This is a weird default to expose.

      Show
      Mark Miller added a comment - Okay - 2v1 I guess - but I think you guys have been eating too many of the crazy pills This is a weird default to expose.
      Hide
      Uwe Schindler added a comment -

      Hm, I only bought one 500 pills Aspirin package for 3$ in Oakland...

      Show
      Uwe Schindler added a comment - Hm, I only bought one 500 pills Aspirin package for 3$ in Oakland...
      Hide
      Michael McCandless added a comment -

      I sort of agree with Mark, now... the default is "don't subsample the terms index", so it is rather odd to make this public.

      I think it does make sense to have a private named constant for this (instead of "1" all throughout the code) for code readability.

      Yet, Lucene does do this "static final public default" constant in a number of places.

      My guess is this is a Solr need, right? Ie, Solr is exposing control over terms index divisor, and would like to set a default for that param, and so the natural thing to do is wire it to Lucene's default. Outside Solr, it's hard to imagine why apps would need to (should) refer to this constant.

      Show
      Michael McCandless added a comment - I sort of agree with Mark, now... the default is "don't subsample the terms index", so it is rather odd to make this public. I think it does make sense to have a private named constant for this (instead of "1" all throughout the code) for code readability. Yet, Lucene does do this "static final public default" constant in a number of places. My guess is this is a Solr need, right? Ie, Solr is exposing control over terms index divisor, and would like to set a default for that param, and so the natural thing to do is wire it to Lucene's default. Outside Solr, it's hard to imagine why apps would need to (should) refer to this constant.
      Hide
      Grant Ingersoll added a comment - - edited

      Boy, I had an awesome use case when I wrote this up, but am forgetting it now. I think it was something like:

      If you are examining the index ahead of time, you could have heuristics about how much memory you want the system to use when loading a reader. Thus, I could have a clause in my IndexReader code that did something like:

      if (index is big or memory is tight){
      reader = IndexReader.open(..., largerTermsIndexDiv);
      } else{
      reader = IndexReader.opent(..., DEFAULT_TERMS_INDEX_DIVISOR);
      }
      

      Hard coding a 1 in that else clause seems like a bad thing to do given there is already a "default" value spec. by Lucene.

      Also, since it is package private, if I extend IndexReader, I won't have access to it.

      At the end of the day, it's not a big deal.

      Show
      Grant Ingersoll added a comment - - edited Boy, I had an awesome use case when I wrote this up, but am forgetting it now. I think it was something like: If you are examining the index ahead of time, you could have heuristics about how much memory you want the system to use when loading a reader. Thus, I could have a clause in my IndexReader code that did something like: if (index is big or memory is tight){ reader = IndexReader.open(..., largerTermsIndexDiv); } else { reader = IndexReader.opent(..., DEFAULT_TERMS_INDEX_DIVISOR); } Hard coding a 1 in that else clause seems like a bad thing to do given there is already a "default" value spec. by Lucene. Also, since it is package private, if I extend IndexReader, I won't have access to it. At the end of the day, it's not a big deal.
      Hide
      Mark Miller added a comment -

      I know its not a big deal - but I have an abrasive reputation to maintain

      The default terms info divisor essentially means that feature is turned off. The 1 just means, don't sub sample the terms index. Its off.

      For your use case, I would do:

      if (index is big or memory is tight){
        reader = IndexReader.open(..., largerTermsIndexDiv);
      } else{
        reader = IndexReader.open(...);
      }
      

      If you don't use that argument, you get the default. Whenever I see an issue to open up something in Lucene without a reason that makes sense to me, I think its a good idea to push back. If it really does make sense, it will make it through - but I don't think anything should get a free pass. This one is still not making sense to me.

      Your right - its not a big deal - but I still don't see a reason to open it up.

      Show
      Mark Miller added a comment - I know its not a big deal - but I have an abrasive reputation to maintain The default terms info divisor essentially means that feature is turned off. The 1 just means, don't sub sample the terms index. Its off. For your use case, I would do: if (index is big or memory is tight){ reader = IndexReader.open(..., largerTermsIndexDiv); } else { reader = IndexReader.open(...); } If you don't use that argument, you get the default. Whenever I see an issue to open up something in Lucene without a reason that makes sense to me, I think its a good idea to push back. If it really does make sense, it will make it through - but I don't think anything should get a free pass. This one is still not making sense to me. Your right - its not a big deal - but I still don't see a reason to open it up.
      Hide
      Tim Smith added a comment -

      I have the following use case:

      i have a configuration bean, this bean can be customized via xml at config time
      in this bean, i expose the setting for the terms index divisor
      so, my bean has to have a default value for this,

      right now, i just use 1 for the default value.
      would be nice if i could just use the lucene constant instead of using 1, as the lucene constant could change in the future (not really likely, but its one less constant i have to maintain)

      if the default is not made public i have 2 options:

      1. use a hard coded constant in my code for the default value (doing this right now)
      2. use an Integer object, and have null be the default

      the nasty part about the second option is that i now have to do conditional opening of the reader depending on if null is the value (unset), when it would be much simpler (and easier for me to maintain), if i just always pass in that value

      Show
      Tim Smith added a comment - I have the following use case: i have a configuration bean, this bean can be customized via xml at config time in this bean, i expose the setting for the terms index divisor so, my bean has to have a default value for this, right now, i just use 1 for the default value. would be nice if i could just use the lucene constant instead of using 1, as the lucene constant could change in the future (not really likely, but its one less constant i have to maintain) if the default is not made public i have 2 options: use a hard coded constant in my code for the default value (doing this right now) use an Integer object, and have null be the default the nasty part about the second option is that i now have to do conditional opening of the reader depending on if null is the value (unset), when it would be much simpler (and easier for me to maintain), if i just always pass in that value
      Hide
      Mark Miller added a comment -

      Okay - fair enough - there is a use case I won't try and argue.

      I will say that I don't see the default ever changing - it would change on the IndexWriter side - I see this as an extra feature and 1 just means its off - but far be it from me to make an injectors life harder.

      Show
      Mark Miller added a comment - Okay - fair enough - there is a use case I won't try and argue. I will say that I don't see the default ever changing - it would change on the IndexWriter side - I see this as an extra feature and 1 just means its off - but far be it from me to make an injectors life harder.
      Hide
      Mark Miller added a comment -

      Eh - still bugging me. If you are allowing users to set this in XML, do you really want to hide what the setting is unless they dig into Lucene internal code?

      As a user, I'm thinking - oh, nice - I can change this - but what am I changing it from? Oh, I have to go dig into Lucene source to see. So you likely want to put in a comment or something saying what the default is - but then you should just put the value, else your still maintaining. It doesnt make sense to have a user go - uh - I'll change this from the unknown default to ... 7!

      This use case hasn't sold me yet after all either. Whatever though - I wouldn't -1 it. "Its not a big deal" is something I've already agreed with.

      Show
      Mark Miller added a comment - Eh - still bugging me. If you are allowing users to set this in XML, do you really want to hide what the setting is unless they dig into Lucene internal code? As a user, I'm thinking - oh, nice - I can change this - but what am I changing it from? Oh, I have to go dig into Lucene source to see. So you likely want to put in a comment or something saying what the default is - but then you should just put the value, else your still maintaining. It doesnt make sense to have a user go - uh - I'll change this from the unknown default to ... 7! This use case hasn't sold me yet after all either. Whatever though - I wouldn't -1 it. "Its not a big deal" is something I've already agreed with.
      Hide
      Tim Smith added a comment -

      users can see the live setting via things like JMX/admin ui
      also, if i intend users to actually change the value regularly, i can provide user facing documentation that would go into detail without the user needing to dig further into lucene internals (memory tuning guide or something)
      currently just exposing this setting myself as a SUPER ADVANCED setting (just in case it will need to be tuned for custom use cases in the future) (can't tune it if its not exposed in config)

      Show
      Tim Smith added a comment - users can see the live setting via things like JMX/admin ui also, if i intend users to actually change the value regularly, i can provide user facing documentation that would go into detail without the user needing to dig further into lucene internals (memory tuning guide or something) currently just exposing this setting myself as a SUPER ADVANCED setting (just in case it will need to be tuned for custom use cases in the future) (can't tune it if its not exposed in config)
      Hide
      Yonik Seeley added a comment -

      My guess is this is a Solr need, right?

      I don't think so... IMO, Solr should do it like a normal user should do it - pass it to open().
      IMO, exposing global config isn't needed or desirable (i.e. I agree with Mark)

      Show
      Yonik Seeley added a comment - My guess is this is a Solr need, right? I don't think so... IMO, Solr should do it like a normal user should do it - pass it to open(). IMO, exposing global config isn't needed or desirable (i.e. I agree with Mark)
      Hide
      Mark Miller added a comment -

      A good admin UI should have this option off by default - ie no terms divisor set on the indexreader side. There is no need to display a number to the user - the option is not in effect by default. Thats always going to make sense. Unless you make your own default and then this is all moot anyway. If Lucene ever decided to change this type of thing, it would happen on the index side. The admin UI should let you enable a terminfodivisor and set a value greater than 1, or turn of the terms index entirely. If a user hasnt enabled it, use the constructors/getters that don't have that param - if they enable it, use the constructors/getters that have that param.

      Not sold. Not sold for a need with JMX either.

      Show
      Mark Miller added a comment - A good admin UI should have this option off by default - ie no terms divisor set on the indexreader side. There is no need to display a number to the user - the option is not in effect by default. Thats always going to make sense. Unless you make your own default and then this is all moot anyway. If Lucene ever decided to change this type of thing, it would happen on the index side. The admin UI should let you enable a terminfodivisor and set a value greater than 1, or turn of the terms index entirely. If a user hasnt enabled it, use the constructors/getters that don't have that param - if they enable it, use the constructors/getters that have that param. Not sold. Not sold for a need with JMX either.
      Hide
      Tim Smith added a comment -

      what you describe requires effectively 2 settings:

      • custom term infos divisor enabled/disabled
      • configured value if enabled

      this then results in more complexity in opening the index reader (conditional opening where a non-conditional open with the configured divisor would do the trick)
      any admin ui would also require more conditional handling of displaying this setting (as you described) (i'm not displaying it other than in JMX now anyway, so it doesn't really matter for me, and JMX just has a readonly attribute that shows the configured value (1 by default))

      personally, i don't care too much if this constant is made public or not (would make it so i use that constant instead of defining my own with the same value), so it only saves me 1 line (and its not like the default will ever change from 1 in the lucene code anyway)

      Show
      Tim Smith added a comment - what you describe requires effectively 2 settings: custom term infos divisor enabled/disabled configured value if enabled this then results in more complexity in opening the index reader (conditional opening where a non-conditional open with the configured divisor would do the trick) any admin ui would also require more conditional handling of displaying this setting (as you described) (i'm not displaying it other than in JMX now anyway, so it doesn't really matter for me, and JMX just has a readonly attribute that shows the configured value (1 by default)) personally, i don't care too much if this constant is made public or not (would make it so i use that constant instead of defining my own with the same value), so it only saves me 1 line (and its not like the default will ever change from 1 in the lucene code anyway)
      Hide
      Mark Miller added a comment - - edited

      heh - well, if you want to be lazy with the admin code, you can just plug in 1. Do you really need a Lucene constant to tell you that a divisor of 1 is no divisor?

      eg - Lucene is not using that default in a way that you guys are arguing users would want to - its using it to set the thing as off. Perhaps it shouldnt have default in its name - being internal, it kind of doesn't matter though. Really its just saying, the divisor is off. You don't really need to worry about what the number is - its just avoiding a magic number in Lucene code.

      eg - there is no default number as far as the user should be concerned. The default is that the IndexReader divisor feature is off.

      Show
      Mark Miller added a comment - - edited heh - well, if you want to be lazy with the admin code, you can just plug in 1. Do you really need a Lucene constant to tell you that a divisor of 1 is no divisor? eg - Lucene is not using that default in a way that you guys are arguing users would want to - its using it to set the thing as off. Perhaps it shouldnt have default in its name - being internal, it kind of doesn't matter though. Really its just saying, the divisor is off. You don't really need to worry about what the number is - its just avoiding a magic number in Lucene code. eg - there is no default number as far as the user should be concerned. The default is that the IndexReader divisor feature is off.
      Hide
      Uwe Schindler added a comment - - edited

      By the way, if you use a final constant, without recompiling it would never change, because even Java puts such constants directly as plain value into class files... - so a change in the JAR file of a new Lucene Version would have no effect. See the problem with Constants.LUCENE_MAIN_VERSION (which is solved from 2.9.1 on)

      Show
      Uwe Schindler added a comment - - edited By the way, if you use a final constant, without recompiling it would never change, because even Java puts such constants directly as plain value into class files... - so a change in the JAR file of a new Lucene Version would have no effect. See the problem with Constants.LUCENE_MAIN_VERSION (which is solved from 2.9.1 on)
      Hide
      Tim Smith added a comment -

      Only thing i would want the constant for is to known what the default divisor is. The default just happens to be 1 (no divisor/off).

      However (while unlikely) a new version of lucene could default to using a real divisor (maybe once everyone is on solid state disks, a higher divisor will result in the same speed of access, with less memory use), at which point, if i upgrade to a new version of lucene, i want to inherit that changed setting (as the default was selected by people that probably know better than me what will better server the general use of lucene in terms of memory and performance)

      right now, if i want to inherit the default i would have to do a conditional IndexReader.open() and store my setting as a pair (enabled/disabled, divisor), which could be encoded in an Integer object (null = disabled/use lucene default)

      if the constant is made public, its easier for me to inherit that default setting.
      of course at the end of the day, either approach will only be about 5 lines of code difference, so again, i don't really care too much about the outcome of this

      By the way, if you use a final constant, without recompiling it would never change,...

      I never drop a new lucene in without recompiling (so that doesn't cause any difference for me)

      Show
      Tim Smith added a comment - Only thing i would want the constant for is to known what the default divisor is. The default just happens to be 1 (no divisor/off). However (while unlikely) a new version of lucene could default to using a real divisor (maybe once everyone is on solid state disks, a higher divisor will result in the same speed of access, with less memory use), at which point, if i upgrade to a new version of lucene, i want to inherit that changed setting (as the default was selected by people that probably know better than me what will better server the general use of lucene in terms of memory and performance) right now, if i want to inherit the default i would have to do a conditional IndexReader.open() and store my setting as a pair (enabled/disabled, divisor), which could be encoded in an Integer object (null = disabled/use lucene default) if the constant is made public, its easier for me to inherit that default setting. of course at the end of the day, either approach will only be about 5 lines of code difference, so again, i don't really care too much about the outcome of this By the way, if you use a final constant, without recompiling it would never change,... I never drop a new lucene in without recompiling (so that doesn't cause any difference for me)
      Hide
      Mark Miller added a comment -

      If you want to inherit the setting, use the correct constructor

      Lucene doesn't advertise a default setting now, because the default is off - you only know we are using a 1 divisor because you are peeking. We could switch to not using that constant at all right now. Nice to have flexibility.

      However (while unlikely) a new version of lucene could default to using a real divisor (maybe once everyone is on solid state disks, a higher divisor will result in the same speed of access, with less memory use)

      We wouldn't change it there by default. Thats a waste on the index size side. We would change the IndexWriter divisor. We wouldn't waste index size on disk for most people just so some people could lower it back to 1 to get there speed back. I'd -1 the heck out of that. There is no default for you to inherit in any case - thats an impl detail. By the default the feature is off. You can't inherit anything about it.

      Show
      Mark Miller added a comment - If you want to inherit the setting, use the correct constructor Lucene doesn't advertise a default setting now, because the default is off - you only know we are using a 1 divisor because you are peeking. We could switch to not using that constant at all right now. Nice to have flexibility. However (while unlikely) a new version of lucene could default to using a real divisor (maybe once everyone is on solid state disks, a higher divisor will result in the same speed of access, with less memory use) We wouldn't change it there by default. Thats a waste on the index size side. We would change the IndexWriter divisor. We wouldn't waste index size on disk for most people just so some people could lower it back to 1 to get there speed back. I'd -1 the heck out of that. There is no default for you to inherit in any case - thats an impl detail. By the default the feature is off. You can't inherit anything about it.
      Hide
      Tim Smith added a comment -

      If you want to inherit the setting, use the correct constructor

      agreed, just a tiny bit of more complexity on my side for that (but its so insignificant that it doesn't really matter, and is really not even worth arguing over)

      if the constant was public, i'd use it, if not, no worries (for me at least)

      By the default the feature is off. You can't inherit anything about it.

      ideally, i want to inherit that the feature is off by default, then allow config to turn it on (by providing a value greater than one for this setting, or just 1 to allow config to explicitly disable)
      using the constructor with no divisor does this (i just need to call the constructor conditionally depending on if the setting was explicitly configured), thats easy and is no problem to do at all (just a couple of extra lines of code in my app layer)

      Show
      Tim Smith added a comment - If you want to inherit the setting, use the correct constructor agreed, just a tiny bit of more complexity on my side for that (but its so insignificant that it doesn't really matter, and is really not even worth arguing over) if the constant was public, i'd use it, if not, no worries (for me at least) By the default the feature is off. You can't inherit anything about it. ideally, i want to inherit that the feature is off by default, then allow config to turn it on (by providing a value greater than one for this setting, or just 1 to allow config to explicitly disable) using the constructor with no divisor does this (i just need to call the constructor conditionally depending on if the setting was explicitly configured), thats easy and is no problem to do at all (just a couple of extra lines of code in my app layer)
      Hide
      Uwe Schindler added a comment -

      I think I move this to 3.1 and then we have enough time to close it or not.

      Show
      Uwe Schindler added a comment - I think I move this to 3.1 and then we have enough time to close it or not.
      Hide
      Steve Rowe added a comment -

      Bulk move 4.4 issues to 4.5 and 5.0

      Show
      Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
      Hide
      Uwe Schindler added a comment -

      Move issue to Lucene 4.9.

      Show
      Uwe Schindler added a comment - Move issue to Lucene 4.9.

        People

        • Assignee:
          Unassigned
          Reporter:
          Grant Ingersoll
        • Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

          • Created:
            Updated:

            Development