Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3440

FastVectorHighlighter: IDF-weighted terms for ordered fragments



    • Lucene Fields:
      New, Patch Available


      The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains all of the terms used in the original query.

      This patch provides ordered fragments with IDF-weighted terms:

      total weight = total weight + IDF for unique term per fragment * boost of query;

      The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer.

      The patch is simple, but it works for us.

      Some ideas:

      • A better approach would be moving the whole fragments-scoring into a separate class.
      • Switch scoring via parameter
      • Exact phrases should be given a even better score, regardless if a phrase-query was executed or not
      • edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher


        1. weight-vs-boost_table01.html
          0.6 kB
          Sebastian Lutze
        2. weight-vs-boost_table02.html
          1 kB
          Sebastian Lutze
        3. LUCENE-4.0-SNAPSHOT-3440-9.patch
          63 kB
          Sebastian Lutze
        4. LUCENE-3440.patch
          60 kB
          Koji Sekiguchi
        5. LUCENE-3440.patch
          61 kB
          Sebastian Lutze
        6. LUCENE-3440_3.6.1-SNAPSHOT.patch
          76 kB
          Sebastian Lutze
        7. LUCENE-3440.patch
          64 kB
          Sebastian Lutze



            • Assignee:
              koji Koji Sekiguchi
              mdz-munich Sebastian Lutze
            • Votes:
              0 Vote for this issue
              5 Start watching this issue


              • Created: