Lucene - Core
  1. Lucene - Core
  2. LUCENE-1415

MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I found this while hunting for the cause of Solr Cache misses.

      The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.

      I will try to submit a patch soon, off for today.

      1. MultiPhraseQueryTest.java
        0.9 kB
        Todd Feak
      2. MultiPhraseQuery.java
        11 kB
        Todd Feak
      3. LUCENE-1415.patch
        3 kB
        Mark Miller
      4. LUCENE-1415.patch
        3 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Thanks, I just committed this.

          Show
          Yonik Seeley added a comment - Thanks, I just committed this.
          Hide
          Uwe Schindler added a comment -

          That's clear: If you set the compilation option of Java 5/6 to limit to 1.4 features, this prevents you from using language features of 5. But the underlying class library is from the java distribution, the compiler comes from (Java5's rt.jar), which contains Arrays.hashCode(). The compiler cannot know, that Arrays.hashCode is not available in 1.4 unless it uses an old rt.jar. If you want to be sure to compile 1.4 only, you have to install Java 1.4.

          Show
          Uwe Schindler added a comment - That's clear: If you set the compilation option of Java 5/6 to limit to 1.4 features, this prevents you from using language features of 5. But the underlying class library is from the java distribution, the compiler comes from (Java5's rt.jar), which contains Arrays.hashCode(). The compiler cannot know, that Arrays.hashCode is not available in 1.4 unless it uses an old rt.jar. If you want to be sure to compile 1.4 only, you have to install Java 1.4.
          Hide
          Mark Miller added a comment -

          Hmmm...I really thought I had my environment setup to limit to 1.4 code...would appear thats not working...

          Here is a 1.4 patch.

          Show
          Mark Miller added a comment - Hmmm...I really thought I had my environment setup to limit to 1.4 code...would appear thats not working... Here is a 1.4 patch.
          Hide
          Yonik Seeley added a comment -

          Thanks guys,
          I believe Arrays.hashCode() is a Java 5 feature?

          Show
          Yonik Seeley added a comment - Thanks guys, I believe Arrays.hashCode() is a Java 5 feature?
          Hide
          Mark Miller added a comment -

          Patch that cleans up formating and merges the unit test with the existing multiphrasequery test.

          Without multiphrasequery change, new test fails. With change, all tests pass.

          Show
          Mark Miller added a comment - Patch that cleans up formating and merges the unit test with the existing multiphrasequery test. Without multiphrasequery change, new test fails. With change, all tests pass.
          Hide
          Todd Feak added a comment -

          Attached a copy of what I did to MultiPhraseQuery to fix the issue. This was created from the 2.4.0 source code. Implementation of hashCode() and equals() uses the Java List implementation as a base so to achieve what looks like the original intent of the comparisons, just taking into account the Term[].

          Again, sorry it's not in the correct format. Hope it helps.

          Show
          Todd Feak added a comment - Attached a copy of what I did to MultiPhraseQuery to fix the issue. This was created from the 2.4.0 source code. Implementation of hashCode() and equals() uses the Java List implementation as a base so to achieve what looks like the original intent of the comparisons, just taking into account the Term[]. Again, sorry it's not in the correct format. Hope it helps.
          Hide
          Todd Feak added a comment - - edited

          I've attached a TestCase demonstrating the broken functionality.

          I realize that this isn't the standard format. I'm not setup for creating SVN patches from my current workstation, and I'm in a bit of a crunch. I hope that this can at least provide some level of assistance in rectifying this situation.

          Show
          Todd Feak added a comment - - edited I've attached a TestCase demonstrating the broken functionality. I realize that this isn't the standard format. I'm not setup for creating SVN patches from my current workstation, and I'm in a bit of a crunch. I hope that this can at least provide some level of assistance in rectifying this situation.
          Hide
          Yonik Seeley added a comment -

          Good catch Todd, this can be demonstrated in Solr with the example server and a query of
          http://localhost:8983/solr/select/?q=ccc
          (ccc has synonyms which end up creating a MultiPhraseQuery)

          Show
          Yonik Seeley added a comment - Good catch Todd, this can be demonstrated in Solr with the example server and a query of http://localhost:8983/solr/select/?q=ccc (ccc has synonyms which end up creating a MultiPhraseQuery)

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Todd Feak
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development