Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Issue for discussion here: http://www.lucidimagination.com/search/document/46a548a79ae9c809/move_trierange_to_core_module_and_integration_issues

      This patch is a rewrite of TrieRange using MultiTermQuery like all other core queries. This should make TrieRange identical in functionality to core range queries.

      1. LUCENE-1602.patch
        104 kB
        Uwe Schindler
      2. LUCENE-1602.patch
        96 kB
        Uwe Schindler
      3. LUCENE-1602.patch
        94 kB
        Uwe Schindler
      4. LUCENE-1602.patch
        94 kB
        Uwe Schindler
      5. LUCENE-1602.patch
        95 kB
        Uwe Schindler
      6. queries.zip
        39 kB
        Uwe Schindler
      7. queries.zip
        39 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Here the patch:
          1. There are some changes to FilterTermEnum in core:

          • Make the private members protected, to have access to them from the very special TrieRangeTermEnum
          • Fix a small inconsistency (docFreq() now only returns a value, if a valid term is existing)
          • All core tests pass

          2. TrieRange patch:

          • New TrieRangeQuery classes
          • TrieRangeFilter is now only a wrapper (like in core)
          • The work is done in package-private TrieRangeTermEnum. This was really a fun to implement this: It started with a very large and complicated version that got simplier and simplier at the end. I am really happy with it now. Maybe a extra test for the "special cases" should be added

          Some additional comments:
          The functions that return the unique term count hit during executing query/filter is implemented as a counter in the enum that counts all valid terms in termCompare(). Maybe this information is also interesting for other MultiTermQueries. How about counting this directly in MultiTermQuery for all such queries and export the method there? It would be really nice, to get e.g. the number of terms for a classical range query, to see the improvement. I could patch the MultiTermQuery to do this internally. Any comments about that?

          The package and class names are still contrib-like as before. A move to core would rename the classes as discussed.

          Show
          Uwe Schindler added a comment - Here the patch: 1. There are some changes to FilterTermEnum in core: Make the private members protected, to have access to them from the very special TrieRangeTermEnum Fix a small inconsistency (docFreq() now only returns a value, if a valid term is existing) All core tests pass 2. TrieRange patch: New TrieRangeQuery classes TrieRangeFilter is now only a wrapper (like in core) The work is done in package-private TrieRangeTermEnum. This was really a fun to implement this: It started with a very large and complicated version that got simplier and simplier at the end. I am really happy with it now. Maybe a extra test for the "special cases" should be added Some additional comments: The functions that return the unique term count hit during executing query/filter is implemented as a counter in the enum that counts all valid terms in termCompare(). Maybe this information is also interesting for other MultiTermQueries. How about counting this directly in MultiTermQuery for all such queries and export the method there? It would be really nice, to get e.g. the number of terms for a classical range query, to see the improvement. I could patch the MultiTermQuery to do this internally. Any comments about that? The package and class names are still contrib-like as before. A move to core would rename the classes as discussed.
          Hide
          Uwe Schindler added a comment -

          An additional extension to MultiTermQuery:
          The original version of TrieRange had a shortcut in the getDocIdSet call: If the range was inverse and would return for sure no documents, it returned the DocIdSet.EMPTY_DOCID_SET instance and did not allocate any OpenBitSet. MultiTermQuery could also do this automatically, if the FilteredTermEnum is empty from the beginning (simple check before allocating the bitset). This would be a nice memory-friendly imporvement. I could try to implement this in MultiTermQuery's getDocIdSet().

          Show
          Uwe Schindler added a comment - An additional extension to MultiTermQuery: The original version of TrieRange had a shortcut in the getDocIdSet call: If the range was inverse and would return for sure no documents, it returned the DocIdSet.EMPTY_DOCID_SET instance and did not allocate any OpenBitSet. MultiTermQuery could also do this automatically, if the FilteredTermEnum is empty from the beginning (simple check before allocating the bitset). This would be a nice memory-friendly imporvement. I could try to implement this in MultiTermQuery's getDocIdSet().
          Hide
          Michael McCandless added a comment -

          It'd be great to do both the EMPTY_DOCID_SET optimization and the getTermCount() addition to MultiTermQuery.

          Show
          Michael McCandless added a comment - It'd be great to do both the EMPTY_DOCID_SET optimization and the getTermCount() addition to MultiTermQuery.
          Hide
          Michael McCandless added a comment -

          For some reason, I'm having trouble applying this patch – alot of failures:

          patch -p0 < patch.x
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/AbstractTrieRangeFilter.java
          Hunk #1 FAILED at 18.
          1 out of 1 hunk FAILED -- saving rejects to file contrib/queries/src/java/org/apache/lucene/search/trie/AbstractTrieRangeFilter.java.rej
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/AbstractTrieRangeQuery.java
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/IntTrieRangeFilter.java
          Hunk #1 FAILED at 17.
          Hunk #2 FAILED at 39.
          2 out of 2 hunks FAILED -- saving rejects to file contrib/queries/src/java/org/apache/lucene/search/trie/IntTrieRangeFilter.java.rej
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/IntTrieRangeQuery.java
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/LongTrieRangeFilter.java
          Hunk #1 FAILED at 17.
          Hunk #2 FAILED at 39.
          2 out of 2 hunks FAILED -- saving rejects to file contrib/queries/src/java/org/apache/lucene/search/trie/LongTrieRangeFilter.java.rej
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/LongTrieRangeQuery.java
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/package.html
          patching file contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeTermEnum.java
          patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestIntTrieRangeFilter.java
          Reversed (or previously applied) patch detected!  Assume -R? [n] n
          n
          Apply anyway? [n] n
          n
          Skipping patch.
          1 out of 1 hunk ignored -- saving rejects to file contrib/queries/src/test/org/apache/lucene/search/trie/TestIntTrieRangeFilter.java.rej
          patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestIntTrieRangeQuery.java
          patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestLongTrieRangeFilter.java
          patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestLongTrieRangeQuery.java
          patching file src/java/org/apache/lucene/search/FilteredTermEnum.java
          

          Uwe, is your area up to date? How did you generate the diff?

          Show
          Michael McCandless added a comment - For some reason, I'm having trouble applying this patch – alot of failures: patch -p0 < patch.x patching file contrib/queries/src/java/org/apache/lucene/search/trie/AbstractTrieRangeFilter.java Hunk #1 FAILED at 18. 1 out of 1 hunk FAILED -- saving rejects to file contrib/queries/src/java/org/apache/lucene/search/trie/AbstractTrieRangeFilter.java.rej patching file contrib/queries/src/java/org/apache/lucene/search/trie/AbstractTrieRangeQuery.java patching file contrib/queries/src/java/org/apache/lucene/search/trie/IntTrieRangeFilter.java Hunk #1 FAILED at 17. Hunk #2 FAILED at 39. 2 out of 2 hunks FAILED -- saving rejects to file contrib/queries/src/java/org/apache/lucene/search/trie/IntTrieRangeFilter.java.rej patching file contrib/queries/src/java/org/apache/lucene/search/trie/IntTrieRangeQuery.java patching file contrib/queries/src/java/org/apache/lucene/search/trie/LongTrieRangeFilter.java Hunk #1 FAILED at 17. Hunk #2 FAILED at 39. 2 out of 2 hunks FAILED -- saving rejects to file contrib/queries/src/java/org/apache/lucene/search/trie/LongTrieRangeFilter.java.rej patching file contrib/queries/src/java/org/apache/lucene/search/trie/LongTrieRangeQuery.java patching file contrib/queries/src/java/org/apache/lucene/search/trie/ package .html patching file contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeTermEnum.java patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestIntTrieRangeFilter.java Reversed (or previously applied) patch detected! Assume -R? [n] n n Apply anyway? [n] n n Skipping patch. 1 out of 1 hunk ignored -- saving rejects to file contrib/queries/src/test/org/apache/lucene/search/trie/TestIntTrieRangeFilter.java.rej patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestIntTrieRangeQuery.java patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestLongTrieRangeFilter.java patching file contrib/queries/src/test/org/apache/lucene/search/trie/TestLongTrieRangeQuery.java patching file src/java/org/apache/lucene/search/FilteredTermEnum.java Uwe, is your area up to date? How did you generate the diff?
          Hide
          Uwe Schindler added a comment -

          New patch that includes new tests:

          • EMPTY_DOCIDSET case on inverse range
          • a comparison of TrieRange and classic Range in term numbers

          The patch LUCENE-1603 must be applied before.

          Show
          Uwe Schindler added a comment - New patch that includes new tests: EMPTY_DOCIDSET case on inverse range a comparison of TrieRange and classic Range in term numbers The patch LUCENE-1603 must be applied before.
          Hide
          Uwe Schindler added a comment -

          Hi Mike:
          I'm up-to-date. Maybe its the old problem of TortoiseSVN on Windows: It generates CR-LF instead of LF alone. dostounix should help. I always apply patches using the TortoiseSVN merge function (that also keeps track of correct local versions from patch header and merges alltogether).

          Show
          Uwe Schindler added a comment - Hi Mike: I'm up-to-date. Maybe its the old problem of TortoiseSVN on Windows: It generates CR-LF instead of LF alone. dostounix should help. I always apply patches using the TortoiseSVN merge function (that also keeps track of correct local versions from patch header and merges alltogether).
          Hide
          Michael McCandless added a comment -

          Hmm it's not the end-of-line issue (though I see that the files are all DOS; we should fix with "svn propset svn:eol-style native contrib/queries/src/java/org/apache/lucene/search/trie/*.java" after we've committed these changes); when I normalize the EOLs it's failing in the same way. Odd.

          Can you instead post a tar file w/ your current *.java?

          Show
          Michael McCandless added a comment - Hmm it's not the end-of-line issue (though I see that the files are all DOS; we should fix with "svn propset svn:eol-style native contrib/queries/src/java/org/apache/lucene/search/trie/*.java" after we've committed these changes); when I normalize the EOLs it's failing in the same way. Odd. Can you instead post a tar file w/ your current *.java?
          Hide
          Uwe Schindler added a comment -

          Here a ZIP file with the source and tests only of the trie package. Please note: two test files were removed by the patch, so you must remove them by hand.

          I tried to patch a fresh checkout of trunk without problems. TortoiseMerge patched all files without problems.

          Show
          Uwe Schindler added a comment - Here a ZIP file with the source and tests only of the trie package. Please note: two test files were removed by the patch, so you must remove them by hand. I tried to patch a fresh checkout of trunk without problems. TortoiseMerge patched all files without problems.
          Hide
          Michael McCandless added a comment -

          These changes look good on quick pass. I can't compile/test based on the zip (since we renamed the new method), but I like the new approach.

          Show
          Michael McCandless added a comment - These changes look good on quick pass. I can't compile/test based on the zip (since we renamed the new method), but I like the new approach.
          Hide
          Uwe Schindler added a comment -

          Here updated files. It also fixes a bug in the tests (filter term count was incorrect)

          Show
          Uwe Schindler added a comment - Here updated files. It also fixes a bug in the tests (filter term count was incorrect)
          Hide
          Michael McCandless added a comment -

          OK all tests pass for me! Looks good.

          Show
          Michael McCandless added a comment - OK all tests pass for me! Looks good.
          Hide
          Uwe Schindler added a comment -

          This patch adds some missing getter methods in filter variant. The ZIP file is not added, as this methods are not really needed for testing, just for completeness.

          Show
          Uwe Schindler added a comment - This patch adds some missing getter methods in filter variant. The ZIP file is not added, as this methods are not really needed for testing, just for completeness.
          Hide
          Uwe Schindler added a comment -

          This is the final patch, with the changes for LUCENE-1603. I also added svn:eol-style to all files in trie and test-trie.
          Because this is not yet committed, the patch may still fail to apply, but I will commit in the next few hours.

          Show
          Uwe Schindler added a comment - This is the final patch, with the changes for LUCENE-1603 . I also added svn:eol-style to all files in trie and test-trie. Because this is not yet committed, the patch may still fail to apply, but I will commit in the next few hours.
          Hide
          Uwe Schindler added a comment -

          Committed revision 765618.

          Show
          Uwe Schindler added a comment - Committed revision 765618.
          Hide
          Uwe Schindler added a comment -

          Fixed the incomplete hashcode(), equals() and toString() of TrieRangeQueries in revision 767982.

          Show
          Uwe Schindler added a comment - Fixed the incomplete hashcode(), equals() and toString() of TrieRangeQueries in revision 767982.

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development