Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Scorer currently has to implement a whole bunch of methods that are never called. The only method that Scorer uses in addition to the methods on DocIdSetIterator is freq(), and as currently implemented this means different things on different Scorers:

      • TermScorer returns its underlying termfreq
      • MinShouldMatchScorer returns how many of its subscorers are matching
      • {Exact|Sloppy}

        PhraseScorer returns how many phrases it has found on a document

      In addition, freq() is never actually called on TermScorer, and it's only used in explain() on the phrase scorers.

      We should make Scorer extend DocIdSetIterator instead. In place of freq(), Scorer would have a coord() method that by default returns 1, and for boolean scorers returns how many subscorers are matching.

      1. LUCENE-6272.patch
        54 kB
        Alan Woodward
      2. LUCENE-6272.patch
        68 kB
        Alan Woodward
      3. LUCENE-6272.patch
        116 kB
        Alan Woodward

        Activity

        Hide
        Alan Woodward added a comment -

        Patch. I also added a PhraseScorer abstract class with a phraseFreq() method that PhraseWeight.explain() calls.

        There are a couple of tests that walk the Scorer tree and check that TermScorers return the correct freq() which I think we can just nuke? Seeing as TermScorer.freq() is literally called nowhere else in the codebase.

        Alternatively, we could keep the existing freq() implementations for backwards-compatibility purposes, deprecating them in 5x and removing in trunk.

        Show
        Alan Woodward added a comment - Patch. I also added a PhraseScorer abstract class with a phraseFreq() method that PhraseWeight.explain() calls. There are a couple of tests that walk the Scorer tree and check that TermScorers return the correct freq() which I think we can just nuke? Seeing as TermScorer.freq() is literally called nowhere else in the codebase. Alternatively, we could keep the existing freq() implementations for backwards-compatibility purposes, deprecating them in 5x and removing in trunk.
        Hide
        Robert Muir added a comment -

        I think its complicated to change the semantics of freq() at the same time as refactoring the class hierarchy.

        freq() and getChildren() are still being discussed here: LUCENE-6229

        Show
        Robert Muir added a comment - I think its complicated to change the semantics of freq() at the same time as refactoring the class hierarchy. freq() and getChildren() are still being discussed here: LUCENE-6229
        Hide
        Alan Woodward added a comment -

        OK, so just add freq() as an abstract method directly on Scorer then? It's not actually called as PostingsEnum.freq() anywhere.

        Show
        Alan Woodward added a comment - OK, so just add freq() as an abstract method directly on Scorer then? It's not actually called as PostingsEnum.freq() anywhere.
        Hide
        Robert Muir added a comment -

        Yes, thats how it used to be. DocsEnum already had freq() and attributes(), so Scorer just built on that.

        Show
        Robert Muir added a comment - Yes, thats how it used to be. DocsEnum already had freq() and attributes(), so Scorer just built on that.
        Hide
        Alan Woodward added a comment -

        Updated patch, just making Scorer extend DocIdSetIterator and adding freq() as an abstract method directly on Scorer.

        Show
        Alan Woodward added a comment - Updated patch, just making Scorer extend DocIdSetIterator and adding freq() as an abstract method directly on Scorer.
        Hide
        Robert Muir added a comment -

        I like it, removes a lot of hassle.

        btw what IDE do you use? Is it intellij? I noticed all the imports in the files modified switch the order of java.XXX vs org.apache.lucene.XXX imports. I think these IDE templates need to be consistent here.

        Show
        Robert Muir added a comment - I like it, removes a lot of hassle. btw what IDE do you use? Is it intellij? I noticed all the imports in the files modified switch the order of java.XXX vs org.apache.lucene.XXX imports. I think these IDE templates need to be consistent here.
        Hide
        Alan Woodward added a comment -

        Yes, IntelliJ. I've got the template set up as:

        • other imports
        • <blank line>
        • import javax.*
        • import java.*
        • <blank line>
        • static imports
        Show
        Alan Woodward added a comment - Yes, IntelliJ. I've got the template set up as: other imports <blank line> import javax.* import java.* <blank line> static imports
        Hide
        Alan Woodward added a comment -

        Which is obviously different to what everyone else has... Hmpf, I thought I had set it up to be consistent with the project. Will change.

        Show
        Alan Woodward added a comment - Which is obviously different to what everyone else has... Hmpf, I thought I had set it up to be consistent with the project. Will change.
        Hide
        Shawn Heisey added a comment -

        I use eclipse. Hopefully one day I can switch, but haven't had enough time to learn IntelliJ yet.

        I haven't looked at everything that gets set by the idea/eclipse targets, but I would have hoped that those targets would force all formatting templates at the project level. I know that for eclipse, the general formatting (indentation, etc) is set by the eclipse target.

        Show
        Shawn Heisey added a comment - I use eclipse. Hopefully one day I can switch, but haven't had enough time to learn IntelliJ yet. I haven't looked at everything that gets set by the idea/eclipse targets, but I would have hoped that those targets would force all formatting templates at the project level. I know that for eclipse, the general formatting (indentation, etc) is set by the eclipse target.
        Hide
        Alan Woodward added a comment -

        Patch with import noise removed.

        Show
        Alan Woodward added a comment - Patch with import noise removed.
        Hide
        Robert Muir added a comment -

        Thanks. In the CHANGES file can we say no longer extends DocsEnum (i think its just a typo).

        +1

        Show
        Robert Muir added a comment - Thanks. In the CHANGES file can we say no longer extends DocsEnum (i think its just a typo). +1
        Hide
        Alan Woodward added a comment -

        Oops, yes. I think I need to go to bed.

        Show
        Alan Woodward added a comment - Oops, yes. I think I need to go to bed.
        Hide
        ASF subversion and git services added a comment -

        Commit 1661376 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1661376 ]

        LUCENE-6272: Scorer extends DocIdSetIterator directly

        Show
        ASF subversion and git services added a comment - Commit 1661376 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1661376 ] LUCENE-6272 : Scorer extends DocIdSetIterator directly
        Hide
        ASF subversion and git services added a comment -

        Commit 1661381 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1661381 ]

        LUCENE-6272: Scorer extends DocIdSetIterator directly

        Show
        ASF subversion and git services added a comment - Commit 1661381 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1661381 ] LUCENE-6272 : Scorer extends DocIdSetIterator directly
        Hide
        Robert Muir added a comment -

        Thanks Alan!

        Show
        Robert Muir added a comment - Thanks Alan!
        Hide
        ASF subversion and git services added a comment -

        Commit 1661396 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1661396 ]

        LUCENE-6270: Fix merge issues with LUCENE-6272.

        Show
        ASF subversion and git services added a comment - Commit 1661396 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1661396 ] LUCENE-6270 : Fix merge issues with LUCENE-6272 .
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Alan Woodward
            Reporter:
            Alan Woodward
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development