Lucene - Core
  1. Lucene - Core
  2. LUCENE-1748

getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4, 2.4.1
    • Fix Version/s: 2.9
    • Component/s: core/query/scoring
    • Labels:
      None
    • Environment:

      all

    • Lucene Fields:
      New, Patch Available

      Description

      I just spent a long time tracking down a bug resulting from upgrading to Lucene 2.4.1 on a project that implements some SpanQuerys of its own and was written against 2.3. Since the project's SpanQuerys didn't implement getPayloadSpans, the call to that method went to SpanQuery.getPayloadSpans which returned null and caused a NullPointerException in the Lucene code, far away from the actual source of the problem.

      It would be much better for this kind of thing to show up at compile time, I think.

      Thanks!

      1. LUCENE-1748.patch
        7 kB
        Mark Miller
      2. LUCENE-1748.patch
        33 kB
        Mark Miller
      3. LUCENE-1748.patch
        34 kB
        Mark Miller

        Activity

        Hide
        Mark Miller added a comment -

        The reason it has a default impl is so that it won't break jar drop in back compat - if it was abstract, it would violate our back compat requirements.

        But you make a most excellent point. Shouldnt it throw a runtime exception (unsupported operation?) or something?

        Show
        Mark Miller added a comment - The reason it has a default impl is so that it won't break jar drop in back compat - if it was abstract, it would violate our back compat requirements. But you make a most excellent point. Shouldnt it throw a runtime exception (unsupported operation?) or something?
        Hide
        Earwin Burrfoot added a comment -

        Shouldnt it throw a runtime exception (unsupported operation?) or something?

        What is the difference between adding an abstract method and adding a method that throws exception in regards to jar drop in back compat?
        In both cases when you drop your new jar in you get an exception, except in the latter case exception is deferred.

        Show
        Earwin Burrfoot added a comment - Shouldnt it throw a runtime exception (unsupported operation?) or something? What is the difference between adding an abstract method and adding a method that throws exception in regards to jar drop in back compat? In both cases when you drop your new jar in you get an exception, except in the latter case exception is deferred.
        Hide
        Hugh Cayless added a comment - - edited

        Ah. I figured it would be something like that. Yes, if abstract isn't possible, an UnsupportedOperationException would at least get closer to the source of the problem.

        From my perspective at least, backwards compatibility is already broken, since Lucene doesn't work with SpanQuerys that don't implement getPayloadSpans--but I understand y'all will have different requirements in this regard.

        Show
        Hugh Cayless added a comment - - edited Ah. I figured it would be something like that. Yes, if abstract isn't possible, an UnsupportedOperationException would at least get closer to the source of the problem. From my perspective at least, backwards compatibility is already broken, since Lucene doesn't work with SpanQuerys that don't implement getPayloadSpans--but I understand y'all will have different requirements in this regard.
        Hide
        Mark Miller added a comment -

        My response sent to mailing list:

        >>bq. Shouldnt it throw a runtime exception (unsupported operation?) or something?
        >>What is the difference between adding an abstract method and adding a method that throws exception in regards to jar drop in back compat?
        >>In both cases when you drop your new jar in you get an exception, except in the latter case exception is deferred.

        Yeah, its dicey - I suppose the idea is that, if you used the code as you used to, it wouldnt try and call getPayloadSpans? And so if you kept using non payloadspans functionality, you would be set, and if you tried to use payloadspans you would get an exception saying the class needed to be updated? But if you make it abstract, we lose jar drop (I know I've read we don't have it for this release anyway) in and everyone has to implement the method. At least with the exception, if you are using the class as you used to, you can continue to do so with no work? Not that I 've considered it for very long at the moment.

        I know, I see your point - this back compat stuff is always dicey - thats why I throw it out there with a question mark - hopefully others will continue to chime in.

        • Show quoted text -
        Show
        Mark Miller added a comment - My response sent to mailing list: >>bq. Shouldnt it throw a runtime exception (unsupported operation?) or something? >>What is the difference between adding an abstract method and adding a method that throws exception in regards to jar drop in back compat? >>In both cases when you drop your new jar in you get an exception, except in the latter case exception is deferred. Yeah, its dicey - I suppose the idea is that, if you used the code as you used to, it wouldnt try and call getPayloadSpans? And so if you kept using non payloadspans functionality, you would be set, and if you tried to use payloadspans you would get an exception saying the class needed to be updated? But if you make it abstract, we lose jar drop (I know I've read we don't have it for this release anyway) in and everyone has to implement the method. At least with the exception, if you are using the class as you used to, you can continue to do so with no work? Not that I 've considered it for very long at the moment. I know, I see your point - this back compat stuff is always dicey - thats why I throw it out there with a question mark - hopefully others will continue to chime in. Show quoted text -
        Hide
        Mark Miller added a comment -

        From my perspective at least, backwards compatibility is already broken, since Lucene doesn't work with SpanQuerys that don't implement getPayloadSpans

        Ah, I see - I hadn't looked at this issue in a long time. It looks like you must implement it to do much of anything right?

        We need to address this better - perhaps abstract is the way to go.

        Show
        Mark Miller added a comment - From my perspective at least, backwards compatibility is already broken, since Lucene doesn't work with SpanQuerys that don't implement getPayloadSpans Ah, I see - I hadn't looked at this issue in a long time. It looks like you must implement it to do much of anything right? We need to address this better - perhaps abstract is the way to go.
        Hide
        Earwin Burrfoot added a comment - - edited

        I took a glance at the code, the whole getPayloadSpans deal is a herecy.

        Each and every implementation looks like:
        public PayloadSpans getPayloadSpans(IndexReader reader) throws IOException

        { return (PayloadSpans) getSpans(reader); }

        Moving it to the base SpanQuery is broken equally to current solution, but yields much less strange copypaste.

        I also have a faint feeling that if you expose a method like
        ClassA method();
        you can then upgrade it to
        SubclassOfClassA method();
        without breaking drop-in compatibility, which renders getPayloadSpans vs getSpans alternative totally useless
        Ok, I'm wrong.

        Show
        Earwin Burrfoot added a comment - - edited I took a glance at the code, the whole getPayloadSpans deal is a herecy. Each and every implementation looks like: public PayloadSpans getPayloadSpans(IndexReader reader) throws IOException { return (PayloadSpans) getSpans(reader); } Moving it to the base SpanQuery is broken equally to current solution, but yields much less strange copypaste. I also have a faint feeling that if you expose a method like ClassA method(); you can then upgrade it to SubclassOfClassA method(); without breaking drop-in compatibility, which renders getPayloadSpans vs getSpans alternative totally useless Ok, I'm wrong.
        Hide
        Mark Miller added a comment -

        Okay, so it says: Implementing classes that want access to the payloads will need to implement this.

        But in reality, if you don't implement it, looks like your screwed if you add it to the container SpanQueries. whether you access the payloads or not.

        Show
        Mark Miller added a comment - Okay, so it says: Implementing classes that want access to the payloads will need to implement this. But in reality, if you don't implement it, looks like your screwed if you add it to the container SpanQueries. whether you access the payloads or not.
        Hide
        Mark Miller added a comment -

        the whole getPayloadSpans deal is a herecy.

        heh. don't dig too deep - it also has to load all of the payloads as it matches whether you ask for them or not (if they exist).

        The ordered or unordered matcher also has to load them and dump them in certain situation when they are not actually needed.

        Lets look at what we need to do to fix this - we don't have to worry too much about back compat, cause its already pretty screwed I think.

        Show
        Mark Miller added a comment - the whole getPayloadSpans deal is a herecy. heh. don't dig too deep - it also has to load all of the payloads as it matches whether you ask for them or not (if they exist). The ordered or unordered matcher also has to load them and dump them in certain situation when they are not actually needed. Lets look at what we need to do to fix this - we don't have to worry too much about back compat, cause its already pretty screwed I think.
        Hide
        Mark Miller added a comment -

        Something should be done here. I admit I havn't spent a lot of time thinking yet, but off the cuff :

        1. We should drop PayloadSpans and just add getPayload to Spans. This should be a compile time break.

        2. We need a way to not collect payloads. Now they will be collected if they are there whether they are used or not. Perhaps an argument in the SpanQuery constructor? It can't really be getSpans because the user would need to control it.

        Show
        Mark Miller added a comment - Something should be done here. I admit I havn't spent a lot of time thinking yet, but off the cuff : 1. We should drop PayloadSpans and just add getPayload to Spans. This should be a compile time break. 2. We need a way to not collect payloads. Now they will be collected if they are there whether they are used or not. Perhaps an argument in the SpanQuery constructor? It can't really be getSpans because the user would need to control it.
        Hide
        Earwin Burrfoot added a comment -

        We should drop PayloadSpans and just add getPayload to Spans. This should be a compile time break.

        +1

        Show
        Earwin Burrfoot added a comment - We should drop PayloadSpans and just add getPayload to Spans. This should be a compile time break. +1
        Hide
        Mark Miller added a comment -

        I don't like to make these back compat calls, but it seems this was a big break and we are breaking jar drop in anyway, so its a nice chance to just kind of correct things.

        This patch is a first draft at resolving the two main issues I'm concerned about:

        1. PayloadSpans is dropped and moved to Spans. Back compat was so out the window anyway, its much better to just make users with custom SpanQuerys implement the new methods. You can always return unavailable payload, and empty payload set, or even not implement and avoid the functionality.

        2. Add a constructor arg to turn off payload collecting for an ordered spannear - it did not lazy load as the payloads javadoc said - the others do, but not ordered spanear. This should let you ignore the new functionality if you don't want to implement - without this, if you had payloads but didn't want to implement, this would blow up at just the sight of payloads. Even if you didn't have a custom SpanQuery to worry about, the query would be much slower if you had payloads but didnt need the query to interact with them - they would be loaded anyway (and its not even always efficient loading - sometimes it has to load a few uneeded payloads that later get dropped) - so you need a way to turn off loading.

        Show
        Mark Miller added a comment - I don't like to make these back compat calls, but it seems this was a big break and we are breaking jar drop in anyway, so its a nice chance to just kind of correct things. This patch is a first draft at resolving the two main issues I'm concerned about: 1. PayloadSpans is dropped and moved to Spans. Back compat was so out the window anyway, its much better to just make users with custom SpanQuerys implement the new methods. You can always return unavailable payload, and empty payload set, or even not implement and avoid the functionality. 2. Add a constructor arg to turn off payload collecting for an ordered spannear - it did not lazy load as the payloads javadoc said - the others do, but not ordered spanear. This should let you ignore the new functionality if you don't want to implement - without this, if you had payloads but didn't want to implement, this would blow up at just the sight of payloads. Even if you didn't have a custom SpanQuery to worry about, the query would be much slower if you had payloads but didnt need the query to interact with them - they would be loaded anyway (and its not even always efficient loading - sometimes it has to load a few uneeded payloads that later get dropped) - so you need a way to turn off loading.
        Hide
        Hugh Cayless added a comment -

        Just wanted to say thanks for your responsiveness on this. It really validates my faith in Lucene as a project.

        Show
        Hugh Cayless added a comment - Just wanted to say thanks for your responsiveness on this. It really validates my faith in Lucene as a project.
        Hide
        Mark Miller added a comment -

        This is going to require a patch to the 2.4 back compat branch to pass tests.

        Show
        Mark Miller added a comment - This is going to require a patch to the 2.4 back compat branch to pass tests.
        Hide
        Mark Miller added a comment -

        I'm going to commit this to the backcompat branch as well - I can't just change the tests - the src has to be changed as well because the tests compile against it rather than trunk.

        Show
        Mark Miller added a comment - I'm going to commit this to the backcompat branch as well - I can't just change the tests - the src has to be changed as well because the tests compile against it rather than trunk.
        Hide
        Mark Miller added a comment -

        Too trunk - removes the lazy load payload issue which has been resolved and committed in a separate issue.

        Still have to consider what to do about the back compat tests.

        Show
        Mark Miller added a comment - Too trunk - removes the lazy load payload issue which has been resolved and committed in a separate issue. Still have to consider what to do about the back compat tests.
        Hide
        Mark Miller added a comment -

        Has something like this come up in the past? How was the back compat branch handled? Should I just remove the PayloadSpans test? It doesn't seem right to apply the patch to the back compat src - it wouldn't apply cleanly anyway, and it doesn't seem right to fit it in. I guess it may not matter though? I'm just worried it will be like pulling a string ...

        the issue:
        The PayloadSpans test was made to test PayloadSpans - but that class shouldn't have been made and the methods from it have been added to Spans, and its been removed. Now the back compat branch test won't compile against the back compat src.

        Show
        Mark Miller added a comment - Has something like this come up in the past? How was the back compat branch handled? Should I just remove the PayloadSpans test? It doesn't seem right to apply the patch to the back compat src - it wouldn't apply cleanly anyway, and it doesn't seem right to fit it in. I guess it may not matter though? I'm just worried it will be like pulling a string ... the issue: The PayloadSpans test was made to test PayloadSpans - but that class shouldn't have been made and the methods from it have been added to Spans, and its been removed. Now the back compat branch test won't compile against the back compat src.
        Hide
        Michael McCandless added a comment -

        Can you make the corresponding changes to the backcompat branch's sources & tests? Note that they only need to be "stubs", so that things compile, since it's the trunk's JAR that they'll be tested against.

        Show
        Michael McCandless added a comment - Can you make the corresponding changes to the backcompat branch's sources & tests? Note that they only need to be "stubs", so that things compile, since it's the trunk's JAR that they'll be tested against.
        Hide
        Mark Miller added a comment -

        Okay - my first time messing with the back compat tests, and changing the src felt dirty - but nothing else that can be done, and as you say, it doesn't hurt anything. Ive got things worked out locally.

        I'll give a shot at commiting this later today.

        Show
        Mark Miller added a comment - Okay - my first time messing with the back compat tests, and changing the src felt dirty - but nothing else that can be done, and as you say, it doesn't hurt anything. Ive got things worked out locally. I'll give a shot at commiting this later today.
        Hide
        Mark Miller added a comment -

        Thanks Hugh!

        Show
        Mark Miller added a comment - Thanks Hugh!
        Hide
        Mark Miller added a comment -

        I'm tempted to make Spans abstract. We don't get these chances often. We managed to make Weight abstract for 2.9, and I think similar logic applies here. We are already breaking the interface - why not go abstract as well? Its a little more trouble, but we are already so far up the creak, it could be worth it no? Its very hard to get rid of these interfaces in a normal situation, and a better opportunity may not present itself.

        Show
        Mark Miller added a comment - I'm tempted to make Spans abstract. We don't get these chances often. We managed to make Weight abstract for 2.9, and I think similar logic applies here. We are already breaking the interface - why not go abstract as well? Its a little more trouble, but we are already so far up the creak, it could be worth it no? Its very hard to get rid of these interfaces in a normal situation, and a better opportunity may not present itself.
        Hide
        Michael McCandless added a comment -

        I'm tempted to make Spans abstract.

        +1

        Show
        Michael McCandless added a comment - I'm tempted to make Spans abstract. +1
        Hide
        Mark Miller added a comment -

        makes Spans abstract

        Show
        Mark Miller added a comment - makes Spans abstract
        Hide
        Michael McCandless added a comment -

        Patch looks good... just need to fix back-compat tests.

        Show
        Michael McCandless added a comment - Patch looks good... just need to fix back-compat tests.
        Hide
        Mark Miller added a comment -

        thanks for taking a look Mike!

        Show
        Mark Miller added a comment - thanks for taking a look Mike!

          People

          • Assignee:
            Mark Miller
            Reporter:
            Hugh Cayless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development