Lucene - Core
  1. Lucene - Core
  2. LUCENE-2878

Allow Scorer to expose positions and payloads aka. nuke spans


    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: Positions Branch
    • Fix Version/s: Positions Branch
    • Component/s: core/search
    • Labels:
    • Lucene Fields:
      New, Patch Available


      Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that.
      Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead.
      To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec.

      So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute.

      I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first .

      The patch passes all core tests ( still fails but I didn't look into the MemoryIndex BulkPostings API yet)

      1. LUCENE-2878.patch
        646 kB
        Alan Woodward
      2. LUCENE-2878.patch
        655 kB
        Alan Woodward
      3. LUCENE-2878.patch
        690 kB
        Alan Woodward
      4. LUCENE-2878.patch
        623 kB
        Alan Woodward
      5. LUCENE-2878.patch
        1.15 MB
        Alan Woodward
      6. LUCENE-2878.patch
        1.14 MB
        Alan Woodward
      7. LUCENE-2878.patch
        9 kB
        Alan Woodward
      8. LUCENE-2878.patch
        3 kB
        Michael McCandless
      9. LUCENE-2878-vs-trunk.patch
        395 kB
        Simon Willnauer
      10. LUCENE-2878.patch
        14 kB
        Alan Woodward
      11. LUCENE-2878.patch
        27 kB
        Alan Woodward
      12. LUCENE-2878.patch
        11 kB
        Alan Woodward
      13. LUCENE-2878.patch
        9.49 MB
        Alan Woodward
      14. LUCENE-2878.patch
        37 kB
        Alan Woodward
      15. LUCENE-2878.patch
        34 kB
        Alan Woodward
      16. LUCENE-2878.patch
        23 kB
        Alan Woodward
      17. LUCENE-2878.patch
        13 kB
        Simon Willnauer
      18. LUCENE-2878.patch
        11 kB
        Alan Woodward
      19. LUCENE-2878.patch
        5 kB
        Alan Woodward
      20. LUCENE-2878.patch
        41 kB
        Simon Willnauer
      21. LUCENE-2878.patch
        36 kB
        Simon Willnauer
      22. LUCENE-2878.patch
        17 kB
        Alan Woodward
      23. LUCENE-2878.patch
        54 kB
        Alan Woodward
      24. LUCENE-2878.patch
        115 kB
        Mike Sokolov
      25. LUCENE-2878.patch
        84 kB
        Simon Willnauer
      26. LUCENE-2878.patch
        49 kB
        Mike Sokolov
      27. LUCENE-2878.patch
        29 kB
        Mike Sokolov
      28. PosHighlighter.patch
        22 kB
        Mike Sokolov
      29. LUCENE-2878-OR.patch
        125 kB
        Mike Sokolov
      30. PosHighlighter.patch
        21 kB
        Mike Sokolov
      31. LUCENE-2878_trunk.patch
        120 kB
        Simon Willnauer
      32. LUCENE-2878_trunk.patch
        119 kB
        Simon Willnauer
      33. LUCENE-2878.patch
        119 kB
        Simon Willnauer
      34. LUCENE-2878.patch
        73 kB
        Simon Willnauer
      35. LUCENE-2878.patch
        45 kB
        Simon Willnauer
      36. LUCENE-2878.patch
        101 kB
        Simon Willnauer

        Issue Links


          No work has yet been logged on this issue.


            • Assignee:
              Robert Muir
              Simon Willnauer
            • Votes:
              11 Vote for this issue
              42 Start watching this issue


              • Created: