Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This will save ram and cleanup a lot of the code.

      Unfortunately the code is still a mess, it has a custom iterator api, and prefixcodedterms has yet another custom iterator api (seriously, maybe the worst one ever).

      1. LUCENE-6350.patch
        22 kB
        Adrien Grand
      2. LUCENE-6350.patch
        15 kB
        Robert Muir
      3. LUCENE-6350.patch
        15 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Updated patch. I gave TermsQuery a hashcode again, so not all termsqueries have the same hashcode.

        This is especially bad since it takes the time to precompute the hashcode.

        I reopened LUCENE-6333, that needs more review.

        Show
        Robert Muir added a comment - Updated patch. I gave TermsQuery a hashcode again, so not all termsqueries have the same hashcode. This is especially bad since it takes the time to precompute the hashcode. I reopened LUCENE-6333 , that needs more review.
        Hide
        Adrien Grand added a comment -

        +1

        Show
        Adrien Grand added a comment - +1
        Hide
        Michael McCandless added a comment -

        I think we should commit LUCENE-6315 (simplifies this package-private
        iterator) before making this iterator public? Also, can you mark
        PrefixCodedTerms @internal (NOT @experimental) before making it public?
        This is not intended to be a publicly consumable API.

        You don't need to do the field.equals(lastField) on each term? The
        iterator tells you when it switches to a new field.

        You added an "if (disi == null)" check, which is great; do we have a
        test that tickles that?

        Is RAMFile.equals/hashCode or PrefixCodedTerms.equals/hashCode ever
        used anywhere else in the code base (because now they are quite
        costly)? I assume not...

        Show
        Michael McCandless added a comment - I think we should commit LUCENE-6315 (simplifies this package-private iterator) before making this iterator public? Also, can you mark PrefixCodedTerms @internal (NOT @experimental) before making it public? This is not intended to be a publicly consumable API. You don't need to do the field.equals(lastField) on each term? The iterator tells you when it switches to a new field. You added an "if (disi == null)" check, which is great; do we have a test that tickles that? Is RAMFile.equals/hashCode or PrefixCodedTerms.equals/hashCode ever used anywhere else in the code base (because now they are quite costly)? I assume not...
        Hide
        Adrien Grand added a comment -

        You added an "if (disi == null)" check, which is great; do we have a test that tickles that?

        BitDocIdSet always returns a non-null iterator so this branch would never be used currently. But I think it's still a good practice since DISI.iterator is allowed to return null?

        Show
        Adrien Grand added a comment - You added an "if (disi == null)" check, which is great; do we have a test that tickles that? BitDocIdSet always returns a non-null iterator so this branch would never be used currently. But I think it's still a good practice since DISI.iterator is allowed to return null?
        Hide
        Adrien Grand added a comment -

        Here is a new patch that iterates on Robert's and tries to address Mike's comments:

        • Updated to trunk and uses the new iterator from LUCENE-6315
        • PrefixCodedTerms is @lucene.internal
        • When building the scorer we use == instead of equals to compare fields (which is correct as per FieldTermIterator.field() javadocs)
        Show
        Adrien Grand added a comment - Here is a new patch that iterates on Robert's and tries to address Mike's comments: Updated to trunk and uses the new iterator from LUCENE-6315 PrefixCodedTerms is @lucene.internal When building the scorer we use == instead of equals to compare fields (which is correct as per FieldTermIterator.field() javadocs)
        Hide
        Michael McCandless added a comment -

        +1, thanks Adrien Grand!

        Show
        Michael McCandless added a comment - +1, thanks Adrien Grand !
        Hide
        ASF subversion and git services added a comment -

        Commit 1678164 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1678164 ]

        LUCENE-6350: TermsQuery is now compressed with PrefixCodedTerms.

        Show
        ASF subversion and git services added a comment - Commit 1678164 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1678164 ] LUCENE-6350 : TermsQuery is now compressed with PrefixCodedTerms.
        Hide
        ASF subversion and git services added a comment -

        Commit 1678167 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1678167 ]

        LUCENE-6350: TermsQuery is now compressed with PrefixCodedTerms.

        Show
        ASF subversion and git services added a comment - Commit 1678167 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1678167 ] LUCENE-6350 : TermsQuery is now compressed with PrefixCodedTerms.
        Hide
        David Smiley added a comment -

        This PrefixCodedTerms utility is new to me – pretty cool. It seems similar to an FST configured to be an FSA. Have they been benchmarked between each other before? I would expect the FST/FSA to compress better.

        Show
        David Smiley added a comment - This PrefixCodedTerms utility is new to me – pretty cool. It seems similar to an FST configured to be an FSA. Have they been benchmarked between each other before? I would expect the FST/FSA to compress better.
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development