Lucene - Core
  1. Lucene - Core
  2. LUCENE-2064

Highlighter should support all MultiTermQuery subclasses without casts

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9.1
    • Fix Version/s: 3.0
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      In order to support MultiTermQuery subclasses the Highlighter component applies instanceof checks for concrete classes from the lucene core. This prevents classes like RegexQuery in contrib from being supported. Introducing dependencies on other contribs is not feasible just for being supported by the highlighter.

      While the instanceof checks and subsequent casts might hopefully go somehow away in the future but for supporting more multterm queries I have a alternative approach using a fake IndexReader that uses a RewriteMethod to force the MTQ to pass the field name to the given reader without doing any real work. It is easier to explain once you see the patch - I will upload shortly.

      1. LUCENE-2064.patch
        4 kB
        Uwe Schindler
      2. LUCENE-2064.patch
        4 kB
        Uwe Schindler
      3. LUCENE-2064.txt
        4 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          someone at apachecon told me you can fix this in your prefs...

          but i don't have a preference to change dateformat!
          How can we fix this problem, its very annoying?

          Show
          Robert Muir added a comment - someone at apachecon told me you can fix this in your prefs... but i don't have a preference to change dateformat! How can we fix this problem, its very annoying?
          Hide
          Simon Willnauer added a comment -

          Argh! Fixed in rev. 836176
          Thanks uwe I did look at the right one first - I hate that AM / PM issue.

          Show
          Simon Willnauer added a comment - Argh! Fixed in rev. 836176 Thanks uwe I did look at the right one first - I hate that AM / PM issue.
          Hide
          Uwe Schindler added a comment -

          You did not use the latest patch, which was simplier and had extra checks? The am pm problem again.

          Show
          Uwe Schindler added a comment - You did not use the latest patch, which was simplier and had extra checks? The am pm problem again.
          Hide
          Simon Willnauer added a comment -

          Commited in revision 836161

          Show
          Simon Willnauer added a comment - Commited in revision 836161
          Hide
          Uwe Schindler added a comment -

          +1

          Show
          Uwe Schindler added a comment - +1
          Hide
          Simon Willnauer added a comment -

          More simple, as MemoryIndex also provides the empty TermEnum.

          Good stuff - I plan to commit this soon if nobody objects.

          Show
          Simon Willnauer added a comment - More simple, as MemoryIndex also provides the empty TermEnum. Good stuff - I plan to commit this soon if nobody objects.
          Hide
          Uwe Schindler added a comment -

          More simple, as MemoryIndex also provides the empty TermEnum.

          Show
          Uwe Schindler added a comment - More simple, as MemoryIndex also provides the empty TermEnum.
          Hide
          Mark Miller added a comment -

          Nice Uwe! - good idea.

          Show
          Mark Miller added a comment - Nice Uwe! - good idea.
          Hide
          Mark Miller added a comment - - edited

          Introducing dependencies on other contribs is not feasible just for being supported by the highlighter.

          Oh its feasible We already depend on the only contrib that currently has a multiterm query - regex - and memory index. But it looks like the regex dependency snuck in there while working on the spanregexquery support - I don't think its actually needed anymore - we should remove it. Its only a build dependency, so its not actually a big deal - just annoying if it happened to keep growing.

          edit

          Hmm - actually, it looks like we can't avoid those dependencies after all - not if we want to test those queries - looks like the contrib dependency on regex stays anyway. Forgot its just there for the tests now.

          Show
          Mark Miller added a comment - - edited Introducing dependencies on other contribs is not feasible just for being supported by the highlighter. Oh its feasible We already depend on the only contrib that currently has a multiterm query - regex - and memory index. But it looks like the regex dependency snuck in there while working on the spanregexquery support - I don't think its actually needed anymore - we should remove it. Its only a build dependency, so its not actually a big deal - just annoying if it happened to keep growing. edit Hmm - actually, it looks like we can't avoid those dependencies after all - not if we want to test those queries - looks like the contrib dependency on regex stays anyway. Forgot its just there for the tests now.
          Hide
          Uwe Schindler added a comment -

          Here the solution with empty MemoryIndex. This seems to be the quickest solution.

          Show
          Uwe Schindler added a comment - Here the solution with empty MemoryIndex. This seems to be the quickest solution.
          Hide
          Mark Miller added a comment -

          As I said, thinking about it, I don't think we can end up fixing it in a better way. We can't force older impls out there to implement what we need - sure we can fix it in core easy enough, but its a real hassle to do this in another way that doesnt require outside multitermquery impls to change - we are going to have to fall back to this anyway with any current plans. So might as well nix those plans for now. I'd prefer our "futurebetterhighlighter" prompt any changes that require so much hassle. Its prob best just to stick with this method.

          I'd just make it so the rest of the IndexReader methods act as if the thing is empty - letting it throw a null pointer exception and catching it makes those impls unhighlightable when they likely could be.

          Show
          Mark Miller added a comment - As I said, thinking about it, I don't think we can end up fixing it in a better way. We can't force older impls out there to implement what we need - sure we can fix it in core easy enough, but its a real hassle to do this in another way that doesnt require outside multitermquery impls to change - we are going to have to fall back to this anyway with any current plans. So might as well nix those plans for now. I'd prefer our "futurebetterhighlighter" prompt any changes that require so much hassle. Its prob best just to stick with this method. I'd just make it so the rest of the IndexReader methods act as if the thing is empty - letting it throw a null pointer exception and catching it makes those impls unhighlightable when they likely could be.
          Hide
          Uwe Schindler added a comment -

          I agree with the exception, but which reader are you talking about.

          You are right, there is no IR available when instantiating (Smart)FakeReader. So catching the NPE is the only way, or implement all methods of IR to return something valid / use an empty static final unmodifiable MemoryIndex as delegate of the FakeReader.

          Show
          Uwe Schindler added a comment - I agree with the exception, but which reader are you talking about. You are right, there is no IR available when instantiating (Smart)FakeReader. So catching the NPE is the only way, or implement all methods of IR to return something valid / use an empty static final unmodifiable MemoryIndex as delegate of the FakeReader.
          Hide
          Simon Willnauer added a comment -

          As I stated above, i see this as a neat workaround until we fix this class / contrib eventually. It won't hurt performance or breaks any compatibility, its hidden deep inside the abysses of Highlighter. Most importantly it adds little functionality to the highlighter component which I believe a lot of people still using.

          I will add another patch tomorrow which catches the exception.

          Show
          Simon Willnauer added a comment - As I stated above, i see this as a neat workaround until we fix this class / contrib eventually. It won't hurt performance or breaks any compatibility, its hidden deep inside the abysses of Highlighter. Most importantly it adds little functionality to the highlighter component which I believe a lot of people still using. I will add another patch tomorrow which catches the exception.
          Hide
          Simon Willnauer added a comment -

          Or simply use the passed in reader as delegate of FakeReader, then it will behave correctly for all methods.

          I agree with the exception, but which reader are you talking about.

          Btw. I was close to name it SmartFakeReader

          Show
          Simon Willnauer added a comment - Or simply use the passed in reader as delegate of FakeReader, then it will behave correctly for all methods. I agree with the exception, but which reader are you talking about. Btw. I was close to name it SmartFakeReader
          Hide
          Mark Miller added a comment -

          I still think getFields on multitermquery is a better option than a Fieldable interface. But if we would drop back to this method anyway, I see no reason to anything with field and multitermquery at all really - unless another use case prompts it.

          Show
          Mark Miller added a comment - I still think getFields on multitermquery is a better option than a Fieldable interface. But if we would drop back to this method anyway, I see no reason to anything with field and multitermquery at all really - unless another use case prompts it.
          Hide
          Uwe Schindler added a comment - - edited

          One comment to the patch:
          If a MTQ subclass does something special during rewrite or in its FilteredTermEnum and calls any other method of FakeReader, it throws NPE. You should catch this Exception and in this case fall back to extract no terms.

          EDIT

          Or simply use the passed in reader as delegate of FakeReader, then it will behave correctly for all methods.

          Show
          Uwe Schindler added a comment - - edited One comment to the patch: If a MTQ subclass does something special during rewrite or in its FilteredTermEnum and calls any other method of FakeReader, it throws NPE. You should catch this Exception and in this case fall back to extract no terms. EDIT Or simply use the passed in reader as delegate of FakeReader, then it will behave correctly for all methods.
          Hide
          Uwe Schindler added a comment -

          Maybe we should add this patch for 3.0 to not break anything after upgrading to 3.0. As it is completely internal in Highlighter, it would not break anything. Requiring a method in 3.0, whcih should be 2.9 compatible and no new functionality would be not good.

          In 3.1 we could add a Fieldable interface that defines getField and eerybody (could) implement it. If not, we could still use this fallback.

          Show
          Uwe Schindler added a comment - Maybe we should add this patch for 3.0 to not break anything after upgrading to 3.0. As it is completely internal in Highlighter, it would not break anything. Requiring a method in 3.0, whcih should be 2.9 compatible and no new functionality would be not good. In 3.1 we could add a Fieldable interface that defines getField and eerybody (could) implement it. If not, we could still use this fallback.
          Hide
          Mark Miller added a comment -

          It is clever - personally I think I'd prefer the getFields method - this is kind of a hack to get the field - though a pretty clever hack. I suppose we could make the argument that this can tide us over - but it will only take a couple of minutes to add getFields as well.

          I think Simon may argue that this will work in more cases by default - where external queries would have to implement the getFeilds method. Which is a good point. Still would prefer something cleaner, but perhaps that makes this worth it nonetheless. It would prob make sense to fall back to this if getFields returned an empty set anyway - which almost makes it not even worth it to do getFields as things don't get any cleaner ...

          We def want the multitermquery clone - thats for sure - Uwe recently mentioned that as well and I'd been meaning to get around to it myself.

          Show
          Mark Miller added a comment - It is clever - personally I think I'd prefer the getFields method - this is kind of a hack to get the field - though a pretty clever hack. I suppose we could make the argument that this can tide us over - but it will only take a couple of minutes to add getFields as well. I think Simon may argue that this will work in more cases by default - where external queries would have to implement the getFeilds method. Which is a good point. Still would prefer something cleaner, but perhaps that makes this worth it nonetheless. It would prob make sense to fall back to this if getFields returned an empty set anyway - which almost makes it not even worth it to do getFields as things don't get any cleaner ... We def want the multitermquery clone - thats for sure - Uwe recently mentioned that as well and I'd been meaning to get around to it myself.
          Hide
          Uwe Schindler added a comment -

          I think there was some discussion on the list about allow a getField() or something for multitermqueries, but in the meantime, this patch will work for now, and its internal to the highlighter so its not like it would have to be deprecated later.

          We discussed on ApacheCon about that. My idea was a Fieldable interface that provides getField and the highlighter must only check instanceof Fieldable. And cloning is really the simpliest to create a copy of MTQ, as all Query instances are cloneable (because Query implements Cloneable)

          Show
          Uwe Schindler added a comment - I think there was some discussion on the list about allow a getField() or something for multitermqueries, but in the meantime, this patch will work for now, and its internal to the highlighter so its not like it would have to be deprecated later. We discussed on ApacheCon about that. My idea was a Fieldable interface that provides getField and the highlighter must only check instanceof Fieldable. And cloning is really the simpliest to create a copy of MTQ, as all Query instances are cloneable (because Query implements Cloneable)
          Hide
          Uwe Schindler added a comment -

          Sorry I was somehow like completely drunk.... I laughed so much about this patch! Only UnexpectedSuccessException is missing in it.

          Show
          Uwe Schindler added a comment - Sorry I was somehow like completely drunk.... I laughed so much about this patch! Only UnexpectedSuccessException is missing in it.
          Hide
          Uwe Schindler added a comment - - edited

          This is cool.

          Highlighter & MTQ is broken in 2.9.1. This patch looks completely broken, but it isn't - and my mind was also broken when I first saw the patch - because of that. This patch is cooler than all heavy commiting during ApacheCon.

          +1 for 3.0 with this patch.That was what I wanted to say with my complete nonsense comment.

          Show
          Uwe Schindler added a comment - - edited This is cool. Highlighter & MTQ is broken in 2.9.1. This patch looks completely broken, but it isn't - and my mind was also broken when I first saw the patch - because of that. This patch is cooler than all heavy commiting during ApacheCon. +1 for 3.0 with this patch.That was what I wanted to say with my complete nonsense comment.
          Hide
          Robert Muir added a comment -

          Uwe, I agree, I think you should set to 3.0

          I think there was some discussion on the list about allow a getField() or something for multitermqueries, but in the meantime, this patch will work for now, and its internal to the highlighter so its not like it would have to be deprecated later.

          Show
          Robert Muir added a comment - Uwe, I agree, I think you should set to 3.0 I think there was some discussion on the list about allow a getField() or something for multitermqueries, but in the meantime, this patch will work for now, and its internal to the highlighter so its not like it would have to be deprecated later.
          Hide
          Simon Willnauer added a comment -

          This is the patch - please let me know if I miss something especially related to the removed copyMultiTermQuery method which I replaced with a clone call. - All tests pass.

          Show
          Simon Willnauer added a comment - This is the patch - please let me know if I miss something especially related to the removed copyMultiTermQuery method which I replaced with a clone call. - All tests pass.

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development