Lucene - Core
  1. Lucene - Core
  2. LUCENE-2945

Surround Query doesn't properly handle equals/hashcode

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.3, 3.1, 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76)

      1. LUCENE-2945-partial1.patch
        8 kB
        Paul Elschot
      2. LUCENE-2945e.patch
        23 kB
        Simon Rosenthal
      3. LUCENE-2945e.patch
        23 kB
        Simon Rosenthal
      4. LUCENE-2945d.patch
        21 kB
        Paul Elschot
      5. LUCENE-2945d.patch
        23 kB
        Paul Elschot
      6. LUCENE-2945c.patch
        14 kB
        Paul Elschot
      7. LUCENE-2945.patch
        4 kB
        Paul Elschot
      8. LUCENE-2945.patch
        6 kB
        Grant Ingersoll
      9. LUCENE-2945.patch
        14 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Paul Elschot added a comment -

          This one is in the category oversights that will come back to haunt

          I found 15 occurrences in the Surround source code of this regex for method definitions:
          Query .*makeLucene.*Query

          Of these, the following actually lack equals/hash:
          DistanceQuery line 76 (as above) and SimpleTerm line 78.
          Taking a closer look at both, the underlying reason why hash/equals is needed is the implementation of Query.rewrite().
          These are the only ones in Surround that do that, so they should indeed be the only culprits here.

          I´ll provide a patch to implement hash/equals for them.

          Show
          Paul Elschot added a comment - This one is in the category oversights that will come back to haunt I found 15 occurrences in the Surround source code of this regex for method definitions: Query .*makeLucene.*Query Of these, the following actually lack equals/hash: DistanceQuery line 76 (as above) and SimpleTerm line 78. Taking a closer look at both, the underlying reason why hash/equals is needed is the implementation of Query.rewrite(). These are the only ones in Surround that do that, so they should indeed be the only culprits here. I´ll provide a patch to implement hash/equals for them.
          Hide
          Paul Elschot added a comment -

          It turns out that 14 or 15 classes in o.a.l.queryParser.surround.query need hashCode() and equals() for this.
          That is also probably the reason why I did not add these methods initially.

          Show
          Paul Elschot added a comment - It turns out that 14 or 15 classes in o.a.l.queryParser.surround.query need hashCode() and equals() for this. That is also probably the reason why I did not add these methods initially.
          Hide
          Paul Elschot added a comment -

          The -partial1 patch is about half way through the files in the surround.query package, and it is not even compiled yet.

          I don't know how much time I have for this for the rest of the week, so Grant, if you are in a hurry you can take it from here.

          The patch also removes some internally unused code, I suppose no one has used that yet.

          Show
          Paul Elschot added a comment - The -partial1 patch is about half way through the files in the surround.query package, and it is not even compiled yet. I don't know how much time I have for this for the rest of the week, so Grant, if you are in a hurry you can take it from here. The patch also removes some internally unused code, I suppose no one has used that yet.
          Hide
          Paul Elschot added a comment - - edited

          A quick fix would be to implement hashCode() and equals() in SrndQuery, the basis of the class hierarchy in surround.query, by using the actual class of the object and by delegating to toString().
          This will work because I implemented toString() all over this class hierarchy to verify the parsing results during development.

          For use in a query results cache, this might be somewhat too restrictive because there is no need to distinguish between the infix and prefix forms. Anyway, that issue would be the same for an implementation of equals() and hashCode() all over the class hierarchy.

          Show
          Paul Elschot added a comment - - edited A quick fix would be to implement hashCode() and equals() in SrndQuery, the basis of the class hierarchy in surround.query, by using the actual class of the object and by delegating to toString(). This will work because I implemented toString() all over this class hierarchy to verify the parsing results during development. For use in a query results cache, this might be somewhat too restrictive because there is no need to distinguish between the infix and prefix forms. Anyway, that issue would be the same for an implementation of equals() and hashCode() all over the class hierarchy.
          Hide
          Grant Ingersoll added a comment -

          What about the anonymous inner classes that actually construct the Query? I think those are the primary cause of the problem.

          Show
          Grant Ingersoll added a comment - What about the anonymous inner classes that actually construct the Query? I think those are the primary cause of the problem.
          Hide
          Paul Elschot added a comment -

          Ok, I forgot that inner classes can use the enclosing object, but they do not expose it.
          That means that for the quick fix above the inner classes need to delegate to their enclosing objects.

          I'll try this quick fix next. It could be good enough.

          Show
          Paul Elschot added a comment - Ok, I forgot that inner classes can use the enclosing object, but they do not expose it. That means that for the quick fix above the inner classes need to delegate to their enclosing objects. I'll try this quick fix next. It could be good enough.
          Hide
          Paul Elschot added a comment -

          The patch of 2 March 2011 contains the quick fix as indicated, and a deprecation for an unused method.
          All contrib tests pass and the javadocs build correctly, but there is no test added for the added hashCode() and equals().

          Show
          Paul Elschot added a comment - The patch of 2 March 2011 contains the quick fix as indicated, and a deprecation for an unused method. All contrib tests pass and the javadocs build correctly, but there is no test added for the added hashCode() and equals().
          Hide
          Paul Elschot added a comment -

          Grant, could you review the patch?

          Show
          Paul Elschot added a comment - Grant, could you review the patch?
          Hide
          Grant Ingersoll added a comment -

          but there is no test added for the added hashCode() and equals().

          Note, QueryUtils has methods for that.

          I will review soon.

          Show
          Grant Ingersoll added a comment - but there is no test added for the added hashCode() and equals(). Note, QueryUtils has methods for that. I will review soon.
          Hide
          Grant Ingersoll added a comment -

          Here's a patch that has a test using QueryUtil that fails. I don't think the getClass() approach is quite right for the base class equals.

          Show
          Grant Ingersoll added a comment - Here's a patch that has a test using QueryUtil that fails. I don't think the getClass() approach is quite right for the base class equals.
          Hide
          Grant Ingersoll added a comment -

          OK, here's a patch with a test that passes. I'm not entirely thrilled about the implementation of equals/hash on the two inner classes (used to be anonymous) but I do think it works. Namely, I use the syntax of the original query as a string, per Paul's original suggestion as part of the hash/equals. It just seems awkward to have to pass that in solely for this purpose, but I didn't see what other information I had around that would make the object unique from an equals/hash standpoint. I suppose the underlying queries list on the ComposedQuery might work and I can try that if others think it makes more sense.

          Show
          Grant Ingersoll added a comment - OK, here's a patch with a test that passes. I'm not entirely thrilled about the implementation of equals/hash on the two inner classes (used to be anonymous) but I do think it works. Namely, I use the syntax of the original query as a string, per Paul's original suggestion as part of the hash/equals. It just seems awkward to have to pass that in solely for this purpose, but I didn't see what other information I had around that would make the object unique from an equals/hash standpoint. I suppose the underlying queries list on the ComposedQuery might work and I can try that if others think it makes more sense.
          Hide
          Paul Elschot added a comment -

          As to the patch of 5 March, QueryUtils uses clone() to test hashCode() and I'd rather not support clone() because of the presence of the basic query factory and because I don't expect reparsing to be a problem to start a clone.
          Also implementing equals() on an anonymous inner class is not easily possible when hashCode() uses a "qualified this", because equals() would need the same qualification on the other object and I don't see a way to have that. An explicit reference from an object of a named static inner class gets around that, and I am curious to know whether equals() could be implemented without an explicit reference in this case.

          I have started coding in these directions, once some tests pass I'll post a patch.

          Show
          Paul Elschot added a comment - As to the patch of 5 March, QueryUtils uses clone() to test hashCode() and I'd rather not support clone() because of the presence of the basic query factory and because I don't expect reparsing to be a problem to start a clone. Also implementing equals() on an anonymous inner class is not easily possible when hashCode() uses a "qualified this", because equals() would need the same qualification on the other object and I don't see a way to have that. An explicit reference from an object of a named static inner class gets around that, and I am curious to know whether equals() could be implemented without an explicit reference in this case. I have started coding in these directions, once some tests pass I'll post a patch.
          Hide
          Grant Ingersoll added a comment -

          The Query class already is cloneable so it needs to support what the QueryUtils is doing. I think it is the anonymous inner class (or in my case, just the inner class) that is the one that matters for all of this. It is an instance of Query and thus needs a proper equals/hashcode. I don't really care about the outer containing classes other than I think it is a misnomer to call them Query classes when they really are factory classes for creating Lucene Queries.

          Show
          Grant Ingersoll added a comment - The Query class already is cloneable so it needs to support what the QueryUtils is doing. I think it is the anonymous inner class (or in my case, just the inner class) that is the one that matters for all of this. It is an instance of Query and thus needs a proper equals/hashcode. I don't really care about the outer containing classes other than I think it is a misnomer to call them Query classes when they really are factory classes for creating Lucene Queries.
          Hide
          Paul Elschot added a comment -

          The Query class already is cloneable so it needs to support what the QueryUtils is doing.

          Would that include throwing a CloneNotSupportedException?

          For these classes I could not find a better name in their package when I wrote this.
          Also I wanted the possibility to generate a query for another engine,
          so I needed an (factory) layer between the parser and the final query.
          There is already a BasicQueryFactory in there that generates Lucene TermQuery and SpanTermQuery leaf objects,
          so perhaps the other Lucene Query objects could also be made there.
          These others are objects of the inner classes that need hashCode() and equals() here, and Lucene BooleanQuery objects.
          This could be a spin off issue.

          Show
          Paul Elschot added a comment - The Query class already is cloneable so it needs to support what the QueryUtils is doing. Would that include throwing a CloneNotSupportedException? For these classes I could not find a better name in their package when I wrote this. Also I wanted the possibility to generate a query for another engine, so I needed an (factory) layer between the parser and the final query. There is already a BasicQueryFactory in there that generates Lucene TermQuery and SpanTermQuery leaf objects, so perhaps the other Lucene Query objects could also be made there. These others are objects of the inner classes that need hashCode() and equals() here, and Lucene BooleanQuery objects. This could be a spin off issue.
          Hide
          Paul Elschot added a comment -

          The LUCENE-2945c.patch starts from the patch of 5 March. It adds static inner classes to with hashCode() and equals() as needed here.
          For now, these classes throw a RuntimeException created from a CloneNotSupportedException in their clone() methods. This leaves clone() not correctly implemented, but at least now a RuntimeException is thrown instead of previously returning an incorrect result.

          The patch also includes a single passing test in SrndQueryTest for equal queries when parsed from strings that only differ in whitespace. The other tests there have been commented out because they use clone() via QueryUtils

          More tests are still needed, also for inequality. The earlier tests all pass.

          Show
          Paul Elschot added a comment - The LUCENE-2945 c.patch starts from the patch of 5 March. It adds static inner classes to with hashCode() and equals() as needed here. For now, these classes throw a RuntimeException created from a CloneNotSupportedException in their clone() methods. This leaves clone() not correctly implemented, but at least now a RuntimeException is thrown instead of previously returning an incorrect result. The patch also includes a single passing test in SrndQueryTest for equal queries when parsed from strings that only differ in whitespace. The other tests there have been commented out because they use clone() via QueryUtils More tests are still needed, also for inequality. The earlier tests all pass.
          Hide
          Paul Elschot added a comment - - edited

          Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c patch. The static inner classes have been moved to package private classes, and their common function was moved to a new super class.

          Also a few more test cases were added. Test cases for testing not equals might be still be added, but I don't see a real need to do that.

          As this adds handling equals/hashcode and has hardly any redundancy, I think this is close to committable. The patch also deprecates a compare..() method, I don't know whether the comments there are to the point.

          Show
          Paul Elschot added a comment - - edited Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c patch. The static inner classes have been moved to package private classes, and their common function was moved to a new super class. Also a few more test cases were added. Test cases for testing not equals might be still be added, but I don't see a real need to do that. As this adds handling equals/hashcode and has hardly any redundancy, I think this is close to committable. The patch also deprecates a compare..() method, I don't know whether the comments there are to the point.
          Hide
          Paul Elschot added a comment - - edited

          New -2945d patch that also has the changes to SpanNearClauseFactory.

          Show
          Paul Elschot added a comment - - edited New -2945d patch that also has the changes to SpanNearClauseFactory.
          Hide
          Paul Elschot added a comment -

          Does the latest patch solve the original problem as expected?

          Show
          Paul Elschot added a comment - Does the latest patch solve the original problem as expected?
          Hide
          Simon Rosenthal added a comment -

          Paul -
          can you refactor the 2945d patch so that it will apply cleanly to the reorganized source tree, as the surround parser is now under modules/ ?

          Show
          Simon Rosenthal added a comment - Paul - can you refactor the 2945d patch so that it will apply cleanly to the reorganized source tree, as the surround parser is now under modules/ ?
          Hide
          Paul Elschot added a comment -

          I noticed the local conflict when the move was done, and I resolved it by reverting the changes in my working copy.
          That means I would be starting from the patch here and that would not be less work for me than for anyone else.
          My first try would be to rename the directories in the patch to the new situation.

          Show
          Paul Elschot added a comment - I noticed the local conflict when the move was done, and I resolved it by reverting the changes in my working copy. That means I would be starting from the patch here and that would not be less work for me than for anyone else. My first try would be to rename the directories in the patch to the new situation.
          Hide
          Simon Rosenthal added a comment -

          LUCENE-29453.patch uploaded.

          modified from 2945d as you suggested, and it applied cleanly to 08/08 nightly build

          Show
          Simon Rosenthal added a comment - LUCENE-29453.patch uploaded. modified from 2945d as you suggested, and it applied cleanly to 08/08 nightly build
          Hide
          Simon Rosenthal added a comment -

          revised patch – needed changes package statements for few files. Applies and compiles cleanly now.

          Show
          Simon Rosenthal added a comment - revised patch – needed changes package statements for few files. Applies and compiles cleanly now.
          Hide
          Erik Hatcher added a comment -

          The current patch is for trunk only. I'll go ahead apply, test, and commit there.

          Simon, et al - if there is a desire to have this on 3.x, can someone create a patch for that branch?

          Show
          Erik Hatcher added a comment - The current patch is for trunk only. I'll go ahead apply, test, and commit there. Simon, et al - if there is a desire to have this on 3.x, can someone create a patch for that branch?
          Hide
          Erik Hatcher added a comment -

          Committed patch to trunk. Slight modified - caught a couple of queryParser -> queryparser changes (one caused compilation issue).

          Show
          Erik Hatcher added a comment - Committed patch to trunk. Slight modified - caught a couple of queryParser -> queryparser changes (one caused compilation issue).
          Hide
          Erik Hatcher added a comment -

          noticed this was still unresolved. closing, fixed on trunk a while ago.

          Show
          Erik Hatcher added a comment - noticed this was still unresolved. closing, fixed on trunk a while ago.

            People

            • Assignee:
              Erik Hatcher
              Reporter:
              Grant Ingersoll
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development