Lucene - Core
  1. Lucene - Core
  2. LUCENE-3474

pass liveDocs Bits down in scorercontext, instead of Weights pulling from the reader

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-1536, this would allow filters to work in a more flexible way (besides just cleaning up)

      1. LUCENE-3474.patch
        73 kB
        Robert Muir
      2. LUCENE-3474.patch
        71 kB
        Robert Muir
      3. LUCENE-3474.patch
        53 kB
        Robert Muir
      4. LUCENE-3474.patch
        52 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        updated patch with javadocs for the acceptDocs, and i had neglected the MatchAllDocsScorer.

        all tests pass

        Show
        Robert Muir added a comment - updated patch with javadocs for the acceptDocs, and i had neglected the MatchAllDocsScorer. all tests pass
        Hide
        Michael McCandless added a comment -

        Patch looks great! Nice to separate this from LUCENE-1536; after this patch, LUCENE-1536 will be tiny!

        Show
        Michael McCandless added a comment - Patch looks great! Nice to separate this from LUCENE-1536 ; after this patch, LUCENE-1536 will be tiny!
        Hide
        Chris Male added a comment -

        Super, lets go ahead and commit this.

        Show
        Chris Male added a comment - Super, lets go ahead and commit this.
        Hide
        Simon Willnauer added a comment -

        I kind of liked the ScorerContext#topScorer(boolean) builder like pattern which is way less verbose than using ctors. Can we add those methods back and force a copy on setAcceptsDocs(Bits) that way we don't need to copy all settings on an incoming context.

        Show
        Simon Willnauer added a comment - I kind of liked the ScorerContext#topScorer(boolean) builder like pattern which is way less verbose than using ctors. Can we add those methods back and force a copy on setAcceptsDocs(Bits) that way we don't need to copy all settings on an incoming context.
        Hide
        Robert Muir added a comment -

        I really hated that thing: Its strange and unintuitive: half the code to the class is explaining the "pattern" that the class uses and how to use it.

        Not good!

        Show
        Robert Muir added a comment - I really hated that thing: Its strange and unintuitive: half the code to the class is explaining the "pattern" that the class uses and how to use it. Not good!
        Hide
        Simon Willnauer added a comment -

        Its strange and unintuitive: half the code to the class is explaining the "pattern" that the class uses and how to use it.

        very straight forward IMO. Well since it used to be that way we should keep it. Just changing stuff like this for "taste" reasons is not acceptable for me though. We already figured out that we have the "builder" vs. "no-builder" camp have here in Lucene and I don't want to fight this every time something like that comes up.

        I really hated that thing

        thats fine I hate the ctor verbosity so we are deadlocking here. Again

        Show
        Simon Willnauer added a comment - Its strange and unintuitive: half the code to the class is explaining the "pattern" that the class uses and how to use it. very straight forward IMO. Well since it used to be that way we should keep it. Just changing stuff like this for "taste" reasons is not acceptable for me though. We already figured out that we have the "builder" vs. "no-builder" camp have here in Lucene and I don't want to fight this every time something like that comes up. I really hated that thing thats fine I hate the ctor verbosity so we are deadlocking here. Again
        Hide
        Robert Muir added a comment -

        Just changing stuff like this for "taste" reasons is not acceptable for me though.

        Really? we can't write patches that change code in trunk any more?

        Show
        Robert Muir added a comment - Just changing stuff like this for "taste" reasons is not acceptable for me though. Really? we can't write patches that change code in trunk any more?
        Hide
        Robert Muir added a comment -

        By the way, I didnt change this for taste reasons.

        I want the Bitset to be required to build a ScorerContext (Note both ctors take it!)

        So this pretty much blew the existing "pattern" out of the water.

        Show
        Robert Muir added a comment - By the way, I didnt change this for taste reasons. I want the Bitset to be required to build a ScorerContext (Note both ctors take it!) So this pretty much blew the existing "pattern" out of the water.
        Hide
        Simon Willnauer added a comment -

        I want the Bitset to be required to build a ScorerContext (Note both ctors take it!)

        rename def() to create(Bits), done.

        Show
        Simon Willnauer added a comment - I want the Bitset to be required to build a ScorerContext (Note both ctors take it!) rename def() to create(Bits), done.
        Hide
        Simon Willnauer added a comment -

        So this pretty much blew the existing "pattern" out of the water.

        boolean boolean params are just another start of a big mess here. the named builder like methods here make it very explicit what you are doing here. If we gonna add more which is likely we gonna end up with more boolean params people need to get in the right order. The chance to introduce an error here is way less than with a ctor.

        Show
        Simon Willnauer added a comment - So this pretty much blew the existing "pattern" out of the water. boolean boolean params are just another start of a big mess here. the named builder like methods here make it very explicit what you are doing here. If we gonna add more which is likely we gonna end up with more boolean params people need to get in the right order. The chance to introduce an error here is way less than with a ctor.
        Hide
        Robert Muir added a comment -

        Simon i have never seen code with this pattern before.

        The chance to introduce error with that crazy-builder-like-thing is tremendous, because its unnatural.
        (In fact i think i spotted some things doing this patch, for other issues)

        What is wrong with normal java objects?

        Show
        Robert Muir added a comment - Simon i have never seen code with this pattern before. The chance to introduce error with that crazy-builder-like-thing is tremendous, because its unnatural. (In fact i think i spotted some things doing this patch, for other issues) What is wrong with normal java objects?
        Hide
        Simon Willnauer added a comment -

        I don't wanna fight this again, its too demanding for me. Go for it I don't think my opinion counts here obviously. I don't want to be in your way really. Sorry for raising an objection on the patch.

        Show
        Simon Willnauer added a comment - I don't wanna fight this again, its too demanding for me. Go for it I don't think my opinion counts here obviously. I don't want to be in your way really. Sorry for raising an objection on the patch.
        Hide
        Robert Muir added a comment -

        Go for it I don't think my opinion counts here obviously

        Why? because I disagree with you? Thats pretty natural man, normal for people to disagree on opinions.

        its not like anyone has committed any shit here, so quit overreacting.

        Show
        Robert Muir added a comment - Go for it I don't think my opinion counts here obviously Why? because I disagree with you? Thats pretty natural man, normal for people to disagree on opinions. its not like anyone has committed any shit here, so quit overreacting.
        Hide
        Uwe Schindler added a comment -

        Mr. Muir: I also disagree with you. If this gets committed I will revert it - just as you did with Yonik in the past - if it blows me out of Lucene PMC/Lucene Committers, who cares. Sorry, this pattern is very simple and often used.

        Show
        Uwe Schindler added a comment - Mr. Muir: I also disagree with you. If this gets committed I will revert it - just as you did with Yonik in the past - if it blows me out of Lucene PMC/Lucene Committers, who cares. Sorry, this pattern is very simple and often used.
        Hide
        Robert Muir added a comment -

        Thats fine Uwe, we can hold search performance hostage over this broken builder pattern

        This is gonna be great.

        Show
        Robert Muir added a comment - Thats fine Uwe, we can hold search performance hostage over this broken builder pattern This is gonna be great.
        Hide
        Robert Muir added a comment -

        Sorry, this pattern is very simple and often used.

        Where? For example where in the JDK uses this .def() etc?

        Show
        Robert Muir added a comment - Sorry, this pattern is very simple and often used. Where? For example where in the JDK uses this .def() etc?
        Hide
        Chris Male added a comment -

        Does Simon's suggestion of replacing def() with create(Bits) solve the mandatory Bits issue? Can we just fix the broken parts of the builder instead of outright replacing it?

        Show
        Chris Male added a comment - Does Simon's suggestion of replacing def() with create(Bits) solve the mandatory Bits issue? Can we just fix the broken parts of the builder instead of outright replacing it?
        Hide
        Simon Willnauer added a comment -

        Why? because I disagree with you? Thats pretty natural man, normal for people to disagree on opinions.

        because I don't see that you are going away from your opinion whatever I say. You made clear you don't want to have any builder pattern in lucene so what is the point of discussing then. I need to put up my own patch which uses the builder to make my objections being in the patch obviously which is not what I am used to. Usually we try to find a compromise and by iterating, right? Each time this pattern comes up there is no way that you move a tiny bit from your opinion just because you don't like it. Well I don't like things people suggest from a code syte perspective but it makes sense very often, so I change it. I don't see why this needs to go for rounds and rounds of fighting here. We did this for a reason when ScorerContext has introduced, it served as a little DSL on top of it enforcing immutability. if you want to have Bit mandatory you should just do something like ScorereContext.create(delDocs).topLevel(true).outOfOrder(false) which makes very clear what you want rather than new ScoreContext(delDocs, true, false) and we gonna have more boolean params here in the future.

        Show
        Simon Willnauer added a comment - Why? because I disagree with you? Thats pretty natural man, normal for people to disagree on opinions. because I don't see that you are going away from your opinion whatever I say. You made clear you don't want to have any builder pattern in lucene so what is the point of discussing then. I need to put up my own patch which uses the builder to make my objections being in the patch obviously which is not what I am used to. Usually we try to find a compromise and by iterating, right? Each time this pattern comes up there is no way that you move a tiny bit from your opinion just because you don't like it. Well I don't like things people suggest from a code syte perspective but it makes sense very often, so I change it. I don't see why this needs to go for rounds and rounds of fighting here. We did this for a reason when ScorerContext has introduced, it served as a little DSL on top of it enforcing immutability. if you want to have Bit mandatory you should just do something like ScorereContext.create(delDocs).topLevel(true).outOfOrder(false) which makes very clear what you want rather than new ScoreContext(delDocs, true, false) and we gonna have more boolean params here in the future.
        Hide
        Simon Willnauer added a comment -

        Where? For example where in the JDK uses this .def() etc?

        those are similar patterns:

        http://download.oracle.com/javase/1.5.0/docs/api/java/lang/ProcessBuilder.html
        http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html
        http://download.oracle.com/javase/6/docs/api/javax/sql/DataSource.html

        we use def() as a shortcut for default() (keyword though) we should rather use create(Bits) IMP

        Show
        Simon Willnauer added a comment - Where? For example where in the JDK uses this .def() etc? those are similar patterns: http://download.oracle.com/javase/1.5.0/docs/api/java/lang/ProcessBuilder.html http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html http://download.oracle.com/javase/6/docs/api/javax/sql/DataSource.html we use def() as a shortcut for default() (keyword though) we should rather use create(Bits) IMP
        Hide
        Robert Muir added a comment -

        because I don't see that you are going away from your opinion whatever I say.

        Thats right, I think i'm allowed to have my own opinion?

        All I did was upload a patch: I'm in no rush to commit here, we can just leave the issue open until everyone is happy.

        Show
        Robert Muir added a comment - because I don't see that you are going away from your opinion whatever I say. Thats right, I think i'm allowed to have my own opinion? All I did was upload a patch: I'm in no rush to commit here, we can just leave the issue open until everyone is happy.
        Hide
        Uwe Schindler added a comment -

        we can hold search performance hostage over this broken builder pattern

        You really think that additional method calls and Eden objects on highest level (which are called when creating scorers, not when scorers are consumed) will slowdown your search? Hey, program your stuff in C/C++ and use CLucene in future.

        Ah by the way, recode all toString() methods anywhere in Lucene and rip StringBuilder!

        Show
        Uwe Schindler added a comment - we can hold search performance hostage over this broken builder pattern You really think that additional method calls and Eden objects on highest level (which are called when creating scorers, not when scorers are consumed) will slowdown your search? Hey, program your stuff in C/C++ and use CLucene in future. Ah by the way, recode all toString() methods anywhere in Lucene and rip StringBuilder!
        Hide
        Robert Muir added a comment -

        You really think that additional method calls and Eden objects on highest level (which are called when creating scorers, not when scorers are consumed) will slowdown your search?

        No, I don't.

        I mean that we can hold our filter execution performance hostage over this internal API, since you have clearly voiced you will revert my commit if i commit the patch.

        Show
        Robert Muir added a comment - You really think that additional method calls and Eden objects on highest level (which are called when creating scorers, not when scorers are consumed) will slowdown your search? No, I don't. I mean that we can hold our filter execution performance hostage over this internal API, since you have clearly voiced you will revert my commit if i commit the patch.
        Hide
        Chris Male added a comment -

        I mean that we can hold our filter execution performance hostage over this internal API, since you have clearly voiced you will revert my commit if i commit the patch.

        Are you saying you wouldn't support an updated version of your patch that went back to builder style?

        Show
        Chris Male added a comment - I mean that we can hold our filter execution performance hostage over this internal API, since you have clearly voiced you will revert my commit if i commit the patch. Are you saying you wouldn't support an updated version of your patch that went back to builder style?
        Hide
        Robert Muir added a comment -

        those are similar patterns:

        http://download.oracle.com/javase/1.5.0/docs/api/java/lang/ProcessBuilder.html
        http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html
        http://download.oracle.com/javase/6/docs/api/javax/sql/DataSource.html

        Actually none of those are similar at all.

        None of those create a new object on each setter, they are normal Builders.

        This ScorerContext is something different entirely.

        Show
        Robert Muir added a comment - those are similar patterns: http://download.oracle.com/javase/1.5.0/docs/api/java/lang/ProcessBuilder.html http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html http://download.oracle.com/javase/6/docs/api/javax/sql/DataSource.html Actually none of those are similar at all. None of those create a new object on each setter, they are normal Builders. This ScorerContext is something different entirely.
        Hide
        Robert Muir added a comment -

        Are you saying you wouldn't support an updated version of your patch that went back to builder style?

        That's correct.

        Show
        Robert Muir added a comment - Are you saying you wouldn't support an updated version of your patch that went back to builder style? That's correct.
        Hide
        Uwe Schindler added a comment - - edited

        Are you saying you wouldn't support an updated version of your patch that went back to builder style?

        That's correct.

        I simply don't suport your patch - deadlock.

        Show
        Uwe Schindler added a comment - - edited Are you saying you wouldn't support an updated version of your patch that went back to builder style? That's correct. I simply don't suport your patch - deadlock.
        Hide
        Chris Male added a comment -

        Is there anything people are prepared to compromise on here?

        Show
        Chris Male added a comment - Is there anything people are prepared to compromise on here?
        Hide
        Robert Muir added a comment -

        I'd be happy with flags instead of the booleans.

        Show
        Robert Muir added a comment - I'd be happy with flags instead of the booleans.
        Hide
        Chris Male added a comment -

        I'm not one for rushing discussions but we're getting close with LUCENE-1536 which this is a large chunk of. Anyway we can come to an agreement here?

        Show
        Chris Male added a comment - I'm not one for rushing discussions but we're getting close with LUCENE-1536 which this is a large chunk of. Anyway we can come to an agreement here?
        Hide
        Michael McCandless added a comment -

        It would be nice if we could discuss the builder pattern (and other
        heated disagreements) without quickly degrading into deadlock. We
        need to be able to work through our disagreements so we can get back
        "real" improvements to Lucene and Solr. In this case LUCENE-1536 is
        an enormous performance gain.

        Net/net I don't like the use of the builder pattern for ScorerContext.
        It seems like overkill: we only have 3 settings here. Even by the
        proponents of the builder pattern this is overkill?

        I think chained setters are less readable (see LUCENE-2308).

        I do agree 2 booleans in a row is asking for sneaky trouble; we can
        add .setX instead (these fields need not be final)? Or the int flags
        to the ctor is a great solution too (I think we should do this for
        FieldType as well).

        Net/net plain old boring java object would work fine here.

        And, in general, I don't think we should let the builder pattern seep
        into Lucene, for the simple reason that it's obviously controversial.
        This is no different from any other controversial change in
        open-source...

        Also, one can always make a shim layer on top of Lucene that exposes
        builder APis for everything? QueryParser, *Query, *Field/Document,
        IR, IW, merge policies/schedulers, etc., all could be cutover to
        builder APIs "up above" right? If we can safely apply the builder
        pattern "on top", ie it need not pollute Lucene's core, why not do
        that? We should only make core changes that are not controversial or
        must be done in core.

        Show
        Michael McCandless added a comment - It would be nice if we could discuss the builder pattern (and other heated disagreements) without quickly degrading into deadlock. We need to be able to work through our disagreements so we can get back "real" improvements to Lucene and Solr. In this case LUCENE-1536 is an enormous performance gain. Net/net I don't like the use of the builder pattern for ScorerContext. It seems like overkill: we only have 3 settings here. Even by the proponents of the builder pattern this is overkill? I think chained setters are less readable (see LUCENE-2308 ). I do agree 2 booleans in a row is asking for sneaky trouble; we can add .setX instead (these fields need not be final)? Or the int flags to the ctor is a great solution too (I think we should do this for FieldType as well). Net/net plain old boring java object would work fine here. And, in general, I don't think we should let the builder pattern seep into Lucene, for the simple reason that it's obviously controversial. This is no different from any other controversial change in open-source... Also, one can always make a shim layer on top of Lucene that exposes builder APis for everything? QueryParser, *Query, *Field/Document, IR, IW, merge policies/schedulers, etc., all could be cutover to builder APIs "up above" right? If we can safely apply the builder pattern "on top", ie it need not pollute Lucene's core, why not do that? We should only make core changes that are not controversial or must be done in core.
        Hide
        Michael McCandless added a comment -

        I think int flags would work well? We only have 2 flags here, plus
        required acceptDocs.

        The vast majority of cases use the default flags:

          new ScorerContext(acceptDocs);
        

        And then for the few places that change the flags it'd be something
        like this:

          new ScorerContext(acceptDocs, ScorerFlags.TOP_LEVEL);
        

        We keep final-ness for the fields.

        This seems like a great solution?

        Failing that.... another option would be to just stop using an object
        here at all, and go back to passing explicit flags down to
        Weight.scorer.

        In fact one benefit of this is we get stronger typing, ie, we force at
        compilation time all Scorer impls to be fixed to handle the new
        setting (vs today where a Scorer can easily silently be missed, thus
        adding latent bug).

        This means on any addition to the scorer API (eg I've long wanted for
        caller to declare up front whether they need scores computed vs "only
        matching", ie MTQWF and CSQ would pass false), we break the API. But
        I think that's actually fine, even in 3.x: making your own Scorer is
        very expert.

        Show
        Michael McCandless added a comment - I think int flags would work well? We only have 2 flags here, plus required acceptDocs. The vast majority of cases use the default flags: new ScorerContext(acceptDocs); And then for the few places that change the flags it'd be something like this: new ScorerContext(acceptDocs, ScorerFlags.TOP_LEVEL); We keep final-ness for the fields. This seems like a great solution? Failing that.... another option would be to just stop using an object here at all, and go back to passing explicit flags down to Weight.scorer. In fact one benefit of this is we get stronger typing, ie, we force at compilation time all Scorer impls to be fixed to handle the new setting (vs today where a Scorer can easily silently be missed, thus adding latent bug). This means on any addition to the scorer API (eg I've long wanted for caller to declare up front whether they need scores computed vs "only matching", ie MTQWF and CSQ would pass false), we break the API. But I think that's actually fine, even in 3.x: making your own Scorer is very expert.
        Hide
        Chris Male added a comment -

        I don't like the idea of shying away from a change just because it's controversial. Sometimes its necessary to shake things up with new ideas.

        While I do agree that it'd be better if we could get past these arguments and make the real changes, you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow.

        Also, this API feels to me to be a lot more internal, so whether or not builders could be built on top of more outward facing concepts like QueryParsers, Field/Document etc, seems a different issue?

        Show
        Chris Male added a comment - I don't like the idea of shying away from a change just because it's controversial. Sometimes its necessary to shake things up with new ideas. While I do agree that it'd be better if we could get past these arguments and make the real changes, you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow. Also, this API feels to me to be a lot more internal, so whether or not builders could be built on top of more outward facing concepts like QueryParsers, Field/Document etc, seems a different issue?
        Hide
        Robert Muir added a comment -

        Lucene went a long time without builder APIs, if you like builders, you can do them in your own code (there is no need for us to enforce such antipatterns)

        This API for ScorerContext is an internal API. the intended consumer is a lucene developer (e.g. guys like me). Its not for average joe... especially in 4.x when you can more easily tweak the scoring API, I think very very few users will write custom queries.

        We don't need to design APIs to baby lucene committers about this stuff, especially about two booleans, but like I said I wouldn't get too upset about flags (even though personally i think its overkill too).

        Show
        Robert Muir added a comment - Lucene went a long time without builder APIs, if you like builders, you can do them in your own code (there is no need for us to enforce such antipatterns) This API for ScorerContext is an internal API. the intended consumer is a lucene developer (e.g. guys like me). Its not for average joe... especially in 4.x when you can more easily tweak the scoring API, I think very very few users will write custom queries. We don't need to design APIs to baby lucene committers about this stuff, especially about two booleans, but like I said I wouldn't get too upset about flags (even though personally i think its overkill too).
        Hide
        Robert Muir added a comment -

        Also, given the attitude presented here towards me uploading a patch (threatening to revert my commits no matter what etc), you can be damned sure I am going to be a total asshole about builder APIs.

        By taking such a ridiculous stance on this internal API you have sealed the fate of builders across the codebase.

        Instead of looking at the actual use case and thinking 'does this need a builder' you assume I'm completely against them: actually I don't have such a blanket opinion: if you look at some classes I have added to lucene you will see that I have even added builders myself... where it makes sense!

        But just because a class has two booleans doesnt mean that it needs a Builder: thats my problem, when you use a pattern as a hammer across the board like this then it becomes an antipattern, because its the wrong solution.

        In some cases Builder makes sense: I think just not here... but this doesn't matter now. I'm gonna scream about builders now even when they really are a good fit.

        Nice work.

        Show
        Robert Muir added a comment - Also, given the attitude presented here towards me uploading a patch (threatening to revert my commits no matter what etc), you can be damned sure I am going to be a total asshole about builder APIs. By taking such a ridiculous stance on this internal API you have sealed the fate of builders across the codebase. Instead of looking at the actual use case and thinking 'does this need a builder' you assume I'm completely against them: actually I don't have such a blanket opinion: if you look at some classes I have added to lucene you will see that I have even added builders myself... where it makes sense! But just because a class has two booleans doesnt mean that it needs a Builder: thats my problem, when you use a pattern as a hammer across the board like this then it becomes an antipattern, because its the wrong solution. In some cases Builder makes sense: I think just not here... but this doesn't matter now. I'm gonna scream about builders now even when they really are a good fit. Nice work.
        Hide
        Chris Male added a comment -

        I actually agree with a lot of what you're saying here Robert. Using the right approach at the right time is key and here you've definitely made a good argument for why we should use the constructor + boolean approach.

        I hope we can continue to have discussions on a case by case basis, about what approaches are best?

        Show
        Chris Male added a comment - I actually agree with a lot of what you're saying here Robert. Using the right approach at the right time is key and here you've definitely made a good argument for why we should use the constructor + boolean approach. I hope we can continue to have discussions on a case by case basis, about what approaches are best?
        Hide
        Steve Rowe added a comment -

        I'm gonna scream about builders now even when they really are a good fit. Nice work.

        Robert, you're blaming Uwe for your own future bad behavior when he threatens to use your tactics against you? Sweet.

        Show
        Steve Rowe added a comment - I'm gonna scream about builders now even when they really are a good fit. Nice work. Robert, you're blaming Uwe for your own future bad behavior when he threatens to use your tactics against you? Sweet.
        Hide
        Michael McCandless added a comment -

        I don't like the idea of shying away from a change just because it's controversial. Sometimes its necessary to shake things up with new ideas.

        +1 I'm all for pushing new ideas that make good, hard improvements to
        Lucene (like LUCENE-1536).

        you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow.

        I am in fact asking for that, because largely Lucene does not adopt
        the builder pattern today, the builder pattern is relatively new and
        trendy, vs Lucene's codebase, and now we see it seeping in, in various
        places/patches/etc. Not only is it new, it's also controversial
        amongst the committers.

        I think it's also reasonable because you can naturally layer the
        builder API on top of a simple java APIs, but not really vice/versa.
        One could create a very nice shim "Lucene builder APIs" library.
        It need not be in our core APIs.

        This way apps that love the builder pattern can use the builder shim;
        those that don't like them can use the plain java APIs.

        As other "popular" patterns try to seep into Lucene, I think we should
        also take a cautious stance: we should not apply a pattern just
        because it exists and has become popular; we should see strong
        technical benefits to Lucene before doing so.

        So, stepping back, "adopting the builder pattern in Lucene's APIs" is
        the overall change I'm talking about, and I think that is a big
        change.

        Also, this API feels to me to be a lot more internal, so whether or not builders could be built on top of more outward facing concepts like QueryParsers, Field/Document etc, seems a different issue?

        Right this is an internal API, but for example if you build custom
        queries/scorers you can still use a builder shim on top of Lucene's
        core ScorerContext. It could be part of this builder shim library
        too.

        Really "Lucene should adopt the builder pattern" to me is a big
        change, and it's a codebase-wide, global decision. It's actually very
        similar to passionate disagreements on whitespace, and this is why we
        (thankfully) have a hard standard on what our whitespace looks like.
        Otherwise we'd be having huge arguments about whether parens should be
        surrounded by whitespace on every patch.

        So net/net I don't think Lucene should adopt the builder pattern, for
        internal or external APIs. Just build a shim library on top.

        Show
        Michael McCandless added a comment - I don't like the idea of shying away from a change just because it's controversial. Sometimes its necessary to shake things up with new ideas. +1 I'm all for pushing new ideas that make good, hard improvements to Lucene (like LUCENE-1536 ). you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow. I am in fact asking for that, because largely Lucene does not adopt the builder pattern today, the builder pattern is relatively new and trendy, vs Lucene's codebase, and now we see it seeping in, in various places/patches/etc. Not only is it new, it's also controversial amongst the committers. I think it's also reasonable because you can naturally layer the builder API on top of a simple java APIs, but not really vice/versa. One could create a very nice shim "Lucene builder APIs" library. It need not be in our core APIs. This way apps that love the builder pattern can use the builder shim; those that don't like them can use the plain java APIs. As other "popular" patterns try to seep into Lucene, I think we should also take a cautious stance: we should not apply a pattern just because it exists and has become popular; we should see strong technical benefits to Lucene before doing so. So, stepping back, "adopting the builder pattern in Lucene's APIs" is the overall change I'm talking about, and I think that is a big change. Also, this API feels to me to be a lot more internal, so whether or not builders could be built on top of more outward facing concepts like QueryParsers, Field/Document etc, seems a different issue? Right this is an internal API, but for example if you build custom queries/scorers you can still use a builder shim on top of Lucene's core ScorerContext. It could be part of this builder shim library too. Really "Lucene should adopt the builder pattern" to me is a big change, and it's a codebase-wide, global decision. It's actually very similar to passionate disagreements on whitespace, and this is why we (thankfully) have a hard standard on what our whitespace looks like. Otherwise we'd be having huge arguments about whether parens should be surrounded by whitespace on every patch. So net/net I don't think Lucene should adopt the builder pattern, for internal or external APIs. Just build a shim library on top.
        Hide
        Chris Male added a comment -

        +1 Mike. I agree.

        Show
        Chris Male added a comment - +1 Mike. I agree.
        Hide
        Simon Willnauer added a comment -

        you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow.

        I can swallow tough pills so +1 - I think this pill is going to change a lot for me on how I see this project and how I feel to contribute.

        simon

        Show
        Simon Willnauer added a comment - you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow. I can swallow tough pills so +1 - I think this pill is going to change a lot for me on how I see this project and how I feel to contribute. simon
        Hide
        Robert Muir added a comment -

        Failing that.... another option would be to just stop using an object
        here at all, and go back to passing explicit flags down to
        Weight.scorer.

        This is interesting for discussion too: because a compile-time break is "better" than a runtime break I think in cases of changes to query/weight/scorer?

        Like we can look at a compile-time-break as a feature: if we add a new thing (e.g. imagine adding this bitset after the fact), its crap for us to say 'but we didnt break backwards compatibility' when in fact you really do need to change your code!

        Show
        Robert Muir added a comment - Failing that.... another option would be to just stop using an object here at all, and go back to passing explicit flags down to Weight.scorer. This is interesting for discussion too: because a compile-time break is "better" than a runtime break I think in cases of changes to query/weight/scorer? Like we can look at a compile-time-break as a feature: if we add a new thing (e.g. imagine adding this bitset after the fact), its crap for us to say 'but we didnt break backwards compatibility' when in fact you really do need to change your code!
        Hide
        Uwe Schindler added a comment -

        So net/net I don't think Lucene should adopt the builder pattern, for internal or external APIs. Just build a shim library on top.

        Let's start with telescopic ctors again, I am a fan of them! And non-final fields are the best I can think of! And don't forget freeze(), we should now freeze all development instead and make our opinions guarded by hotspot bugs.

        I will no longer discuss here, I will do something else, more productive, beyond Lucene.

        Show
        Uwe Schindler added a comment - So net/net I don't think Lucene should adopt the builder pattern, for internal or external APIs. Just build a shim library on top. Let's start with telescopic ctors again, I am a fan of them! And non-final fields are the best I can think of! And don't forget freeze(), we should now freeze all development instead and make our opinions guarded by hotspot bugs. I will no longer discuss here, I will do something else, more productive, beyond Lucene.
        Hide
        Chris Male added a comment -

        This is interesting for discussion too: because a compile-time break is "better" than a runtime break I think in cases of changes to query/weight/scorer?

        I agree. The APIs of Query/Weight/Scorer feel to me to be so important that we should be very wary when making changes, but when we do want to make changes (and we should) then being explicit that something has changed and that people need to look carefully at the new API, seems beneficial.

        At the same time the benefits of the Contexts was that we could more easily make API changes. But perhaps with these classes some reluctance is beneficial?

        What would the signatures look like if we dumped ScorerContext?

        Show
        Chris Male added a comment - This is interesting for discussion too: because a compile-time break is "better" than a runtime break I think in cases of changes to query/weight/scorer? I agree. The APIs of Query/Weight/Scorer feel to me to be so important that we should be very wary when making changes, but when we do want to make changes (and we should) then being explicit that something has changed and that people need to look carefully at the new API, seems beneficial. At the same time the benefits of the Contexts was that we could more easily make API changes. But perhaps with these classes some reluctance is beneficial? What would the signatures look like if we dumped ScorerContext?
        Hide
        Robert Muir added a comment -

        At the same time the benefits of the Contexts was that we could more easily make API changes. But perhaps with these classes some reluctance is beneficial?

        But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code.
        The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue:
        then your custom query continues to work fine, until you use a filter and its silently wrong!

        What would the signatures look like if we dumped ScorerContext?

        Like Lucene 3.x: IR, boolean, boolean, Bits

        Show
        Robert Muir added a comment - At the same time the benefits of the Contexts was that we could more easily make API changes. But perhaps with these classes some reluctance is beneficial? But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code. The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue: then your custom query continues to work fine, until you use a filter and its silently wrong! What would the signatures look like if we dumped ScorerContext? Like Lucene 3.x: IR, boolean, boolean, Bits
        Hide
        Chris Male added a comment -

        But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code.
        The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue:
        then your custom query continues to work fine, until you use a filter and its silently wrong!

        Agreed.

        Like Lucene 3.x: IR, boolean, boolean, Bits

        Do we have any default value issues or anything with those booleans?

        Show
        Chris Male added a comment - But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code. The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue: then your custom query continues to work fine, until you use a filter and its silently wrong! Agreed. Like Lucene 3.x: IR, boolean, boolean, Bits Do we have any default value issues or anything with those booleans?
        Hide
        Robert Muir added a comment -

        No, I think the scorercontext was geared at us being able to make changes without breaking the API: After
        Mike's comment the more I think about it I think this was a bad idea... its being used here like Solr's NamedList
        hammer...

        I think we should keep type safety on the Q/W/S apis to avoid traps... adding Bits like this to a ScorerContext
        is a great example where we 'break backwards compatibility' in a sneaky way... far better to have a compile break.

        So, +1 to go nuclear on ScorerContext completely.

        Show
        Robert Muir added a comment - No, I think the scorercontext was geared at us being able to make changes without breaking the API: After Mike's comment the more I think about it I think this was a bad idea... its being used here like Solr's NamedList hammer... I think we should keep type safety on the Q/W/S apis to avoid traps... adding Bits like this to a ScorerContext is a great example where we 'break backwards compatibility' in a sneaky way... far better to have a compile break. So, +1 to go nuclear on ScorerContext completely.
        Hide
        Chris Male added a comment -

        Can you want to put together a patch to that regard? I'm definitely +1 for the idea at the moment.

        Show
        Chris Male added a comment - Can you want to put together a patch to that regard? I'm definitely +1 for the idea at the moment.
        Hide
        Michael McCandless added a comment -

        But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code.
        The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue:
        then your custom query continues to work fine, until you use a filter and its silently wrong!

        Actually I think this is a serious problem with what we have today?

        It's really awful/dangerous if on upgrade, silently, filtering stops working against your custom Query.

        It's also awful if we internally mess up and miss a Query that should have been fixed to handle .acceptedDocs, which could easily happen today.

        OK I think we should just go back to passing the arguments directly, noting that this API is internal, so that custom queries out there will get a hard compile-time break, not silently get the wrong results, when there's an important change here. It's too dangerous to use a context object.

        Show
        Michael McCandless added a comment - But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code. The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue: then your custom query continues to work fine, until you use a filter and its silently wrong! Actually I think this is a serious problem with what we have today? It's really awful/dangerous if on upgrade, silently, filtering stops working against your custom Query. It's also awful if we internally mess up and miss a Query that should have been fixed to handle .acceptedDocs, which could easily happen today. OK I think we should just go back to passing the arguments directly, noting that this API is internal, so that custom queries out there will get a hard compile-time break, not silently get the wrong results, when there's an important change here. It's too dangerous to use a context object.
        Hide
        Robert Muir added a comment -

        sure, I'd be happy too... was gonna wait a little bit to see if anyone grossly objected.

        I still don't like the fact this dodges the builder discussion completely, but again my real opinion for the record:

        • I myself have added builders to lucene:
        1. PrefixCodedTerms.Builder: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/PrefixCodedTerms.java
        2. SynonymMap.Builder: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java
        • Pretty sure Mike added FST.Builder too!
        • In these cases i think builder is appropriate: its building a complex thing up thru a series of add()s or whtaever and then finalizing into an immutable thing (trie or FST).
        • my problems with Builder are I feel its only appropriate if you are actually Building something. I dont think having 2 booleans counts as 'building', and I().dont().want().lucenes().code().to().read().like().this() ... i think chaining only makes sense in certain cases like StringBuilder, but NOT for building an FST for example.
        • The builders i mentioned don't create objects on every add() or set(), and they aren't going to be in anyone's tight loop, so I think they make perfect sense.

        So really my opinion on Builders is no different than my opinion on ArrayList: you use them when its the appropriate solution and when it makes sense... I think thats when you are actually building stuff.

        Show
        Robert Muir added a comment - sure, I'd be happy too... was gonna wait a little bit to see if anyone grossly objected. I still don't like the fact this dodges the builder discussion completely, but again my real opinion for the record: I myself have added builders to lucene: PrefixCodedTerms.Builder: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/PrefixCodedTerms.java SynonymMap.Builder: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java Pretty sure Mike added FST.Builder too! In these cases i think builder is appropriate: its building a complex thing up thru a series of add()s or whtaever and then finalizing into an immutable thing (trie or FST). my problems with Builder are I feel its only appropriate if you are actually Building something. I dont think having 2 booleans counts as 'building', and I().dont().want().lucenes().code().to().read().like().this() ... i think chaining only makes sense in certain cases like StringBuilder, but NOT for building an FST for example. The builders i mentioned don't create objects on every add() or set(), and they aren't going to be in anyone's tight loop, so I think they make perfect sense. So really my opinion on Builders is no different than my opinion on ArrayList: you use them when its the appropriate solution and when it makes sense... I think thats when you are actually building stuff.
        Hide
        Chris Male added a comment -

        I think its a good idea to go ahead and put a patch together so we can discuss it directly.

        Show
        Chris Male added a comment - I think its a good idea to go ahead and put a patch together so we can discuss it directly.
        Hide
        Robert Muir added a comment -

        patch with the type-safe API and nuking ScorerContext

        Show
        Robert Muir added a comment - patch with the type-safe API and nuking ScorerContext
        Hide
        Robert Muir added a comment -

        by the way, this worked well for us already: the compile-break found some sneaky little scorers using their own deletedDocs instead of the acceptDocs.

        Ideally before we switch on any filter optimizations, we can hack up AssertingIndexSearcher to randomly use Bits/Filter in different ways to flush out lots of problems in tests.

        Show
        Robert Muir added a comment - by the way, this worked well for us already: the compile-break found some sneaky little scorers using their own deletedDocs instead of the acceptDocs. Ideally before we switch on any filter optimizations, we can hack up AssertingIndexSearcher to randomly use Bits/Filter in different ways to flush out lots of problems in tests.
        Hide
        Simon Willnauer added a comment -

        +1 this is much better than anything else! compile time error is good here even if it makes it less comfortable for changes ie. bw break.

        Show
        Simon Willnauer added a comment - +1 this is much better than anything else! compile time error is good here even if it makes it less comfortable for changes ie. bw break.
        Hide
        Michael McCandless added a comment -

        +1

        And it's awesome this already caught places where we were missing the acceptDocs cutover. Bug averted

        Show
        Michael McCandless added a comment - +1 And it's awesome this already caught places where we were missing the acceptDocs cutover. Bug averted
        Hide
        Chris Male added a comment -

        I did a quick review:

        • Should BooleanQuery.BooleanWeight.createConjunctionTermScorer() be passed the acceptDocs too? Its currently using context.reader.liveDocs.

        Otherwise, +1, we should maybe commit this and then spin off an issue for improving AssertingIndexSearcher?

        Show
        Chris Male added a comment - I did a quick review: Should BooleanQuery.BooleanWeight.createConjunctionTermScorer() be passed the acceptDocs too? Its currently using context.reader.liveDocs. Otherwise, +1, we should maybe commit this and then spin off an issue for improving AssertingIndexSearcher?
        Hide
        Robert Muir added a comment -

        Thanks Chris, good find.

        Otherwise, +1, we should maybe commit this and then spin off an issue for improving AssertingIndexSearcher?

        Actually I think this will be more easily done in LUCENE-1536? e.g. if we add the suggested heuristic there, as a boolean protected expert method, subclasses can override the heuristic if they need... and AssertingIndexSearcher could just return random.nextBoolean()

        Show
        Robert Muir added a comment - Thanks Chris, good find. Otherwise, +1, we should maybe commit this and then spin off an issue for improving AssertingIndexSearcher? Actually I think this will be more easily done in LUCENE-1536 ? e.g. if we add the suggested heuristic there, as a boolean protected expert method, subclasses can override the heuristic if they need... and AssertingIndexSearcher could just return random.nextBoolean()
        Hide
        Chris Male added a comment -

        Okay great. Lets commit this then.

        Show
        Chris Male added a comment - Okay great. Lets commit this then.
        Hide
        Robert Muir added a comment -

        This means on any addition to the scorer API (eg I've long wanted for
        caller to declare up front whether they need scores computed vs "only
        matching", ie MTQWF and CSQ would pass false), we break the API. But
        I think that's actually fine, even in 3.x: making your own Scorer is
        very expert.

        Just as an FYI: we already have an issue open for that too: LUCENE-3331

        But I don't think we will see real gains from that with StandardCodec/DocsEnum today?

        Show
        Robert Muir added a comment - This means on any addition to the scorer API (eg I've long wanted for caller to declare up front whether they need scores computed vs "only matching", ie MTQWF and CSQ would pass false), we break the API. But I think that's actually fine, even in 3.x: making your own Scorer is very expert. Just as an FYI: we already have an issue open for that too: LUCENE-3331 But I don't think we will see real gains from that with StandardCodec/DocsEnum today?
        Hide
        Michael McCandless added a comment -

        But I don't think we will see real gains from that with StandardCodec/DocsEnum today?

        Right, not yet (not until our enum impls are able to [efficiently] separately decode docs and docs+freqs), and so we can wait until then.

        Show
        Michael McCandless added a comment - But I don't think we will see real gains from that with StandardCodec/DocsEnum today? Right, not yet (not until our enum impls are able to [efficiently] separately decode docs and docs+freqs), and so we can wait until then.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development