Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2878

Allow Scorer to expose positions and payloads aka. nuke spans

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • Positions Branch
    • 7.4, 8.0
    • core/search
    • New, Patch Available

    Description

      Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that.
      Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead.
      To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec.

      So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute.

      I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first .

      The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet)

      Attachments

        1. PosHighlighter.patch
          21 kB
          Michael Sokolov
        2. PosHighlighter.patch
          22 kB
          Michael Sokolov
        3. LUCENE-2878-vs-trunk.patch
          395 kB
          Simon Willnauer
        4. LUCENE-2878-OR.patch
          125 kB
          Michael Sokolov
        5. LUCENE-2878.patch
          101 kB
          Simon Willnauer
        6. LUCENE-2878.patch
          45 kB
          Simon Willnauer
        7. LUCENE-2878.patch
          73 kB
          Simon Willnauer
        8. LUCENE-2878.patch
          119 kB
          Simon Willnauer
        9. LUCENE-2878.patch
          29 kB
          Michael Sokolov
        10. LUCENE-2878.patch
          49 kB
          Michael Sokolov
        11. LUCENE-2878.patch
          84 kB
          Simon Willnauer
        12. LUCENE-2878.patch
          115 kB
          Michael Sokolov
        13. LUCENE-2878.patch
          54 kB
          Alan Woodward
        14. LUCENE-2878.patch
          17 kB
          Alan Woodward
        15. LUCENE-2878.patch
          36 kB
          Simon Willnauer
        16. LUCENE-2878.patch
          41 kB
          Simon Willnauer
        17. LUCENE-2878.patch
          5 kB
          Alan Woodward
        18. LUCENE-2878.patch
          11 kB
          Alan Woodward
        19. LUCENE-2878.patch
          13 kB
          Simon Willnauer
        20. LUCENE-2878.patch
          23 kB
          Alan Woodward
        21. LUCENE-2878.patch
          34 kB
          Alan Woodward
        22. LUCENE-2878.patch
          37 kB
          Alan Woodward
        23. LUCENE-2878.patch
          9.49 MB
          Alan Woodward
        24. LUCENE-2878.patch
          11 kB
          Alan Woodward
        25. LUCENE-2878.patch
          27 kB
          Alan Woodward
        26. LUCENE-2878.patch
          14 kB
          Alan Woodward
        27. LUCENE-2878.patch
          3 kB
          Michael McCandless
        28. LUCENE-2878.patch
          9 kB
          Alan Woodward
        29. LUCENE-2878.patch
          1.14 MB
          Alan Woodward
        30. LUCENE-2878.patch
          1.15 MB
          Alan Woodward
        31. LUCENE-2878.patch
          623 kB
          Alan Woodward
        32. LUCENE-2878.patch
          690 kB
          Alan Woodward
        33. LUCENE-2878.patch
          655 kB
          Alan Woodward
        34. LUCENE-2878.patch
          646 kB
          Alan Woodward
        35. LUCENE-2878_trunk.patch
          119 kB
          Simon Willnauer
        36. LUCENE-2878_trunk.patch
          120 kB
          Simon Willnauer

        Issue Links

          Activity

            People

              rcmuir Robert Muir
              simonw Simon Willnauer
              Votes:
              11 Vote for this issue
              Watchers:
              41 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: