Lucene - Core
  1. Lucene - Core
  2. LUCENE-4856

If no Passages are found for a doc, PostingsHighlighter should return first n sentences?

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3, 6.0
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New
    1. LUCENE-4856.patch
      16 kB
      Michael McCandless
    2. LUCENE-4856.patch
      5 kB
      Michael McCandless

      Activity

      Hide
      Michael McCandless added a comment -

      Patch ... it seems to work but I have no idea if I'm using BreakIterator properly.

      Show
      Michael McCandless added a comment - Patch ... it seems to work but I have no idea if I'm using BreakIterator properly.
      Hide
      Robert Muir added a comment -

      I think this would be extremely confusing with multiple fields...

      Show
      Robert Muir added a comment - I think this would be extremely confusing with multiple fields...
      Hide
      Robert Muir added a comment -
      • Maybe the method could be protected so if someone doesnt like whatever we default to, they can subclass.
      • What will it do if the document doesnt have the field(s)? Return null?
      • What score does the passage get?
      • If we change the default, some docs need changing: e.g. "If no highlights were found for a document, its value is <code>null</code>"
      • In the main loop, this if check could then go? Or maybe this is the place to do this instead?
              if (passages.length > 0) {
                // otherwise a null snippet
                highlights.put(doc, formatter.format(passages, content));
              }
        
      Show
      Robert Muir added a comment - Maybe the method could be protected so if someone doesnt like whatever we default to, they can subclass. What will it do if the document doesnt have the field(s)? Return null? What score does the passage get? If we change the default, some docs need changing: e.g. "If no highlights were found for a document, its value is <code>null</code>" In the main loop, this if check could then go? Or maybe this is the place to do this instead? if (passages.length > 0) { // otherwise a null snippet highlights.put(doc, formatter.format(passages, content)); }
      Hide
      Michael McCandless added a comment -

      Maybe the method could be protected so if someone doesnt like whatever we default to, they can subclass.

      Good idea, I'll fix that.

      What will it do if the document doesnt have the field(s)? Return null?

      It returns null ... I'll add a test.

      What score does the passage get?

      I currently leave it at 0 ... maybe we should do NaN? This would give
      the formatter a way to detect the "missing highlights"?

      If we change the default, some docs need changing: e.g. "If no highlights were found for a document, its value is <code>null</code>"

      OK, I'll add a nocommit so I don't forget this ...

      In the main loop, this if check could then go? Or maybe this is the place to do this instead?

      I'll move it up. I'm not sure the if check can go ... does
      BreakIterator ever return nothing?

      I think this would be extremely confusing with multiple fields...

      Hmmm, true. An app might highlight N fields and then would want to
      see null on some of those fields so that it knows to use the other
      field's highlights.

      But I think the more common case is highlighting a big field (eg the
      "body" field)? I would lean towards defaulting this on, and adding
      setter / or you subclass and override getEmptyHighlight to turn it
      on. Hmmm maybe getEmptyHighlight should take the field name...

      Show
      Michael McCandless added a comment - Maybe the method could be protected so if someone doesnt like whatever we default to, they can subclass. Good idea, I'll fix that. What will it do if the document doesnt have the field(s)? Return null? It returns null ... I'll add a test. What score does the passage get? I currently leave it at 0 ... maybe we should do NaN? This would give the formatter a way to detect the "missing highlights"? If we change the default, some docs need changing: e.g. "If no highlights were found for a document, its value is <code>null</code>" OK, I'll add a nocommit so I don't forget this ... In the main loop, this if check could then go? Or maybe this is the place to do this instead? I'll move it up. I'm not sure the if check can go ... does BreakIterator ever return nothing? I think this would be extremely confusing with multiple fields... Hmmm, true. An app might highlight N fields and then would want to see null on some of those fields so that it knows to use the other field's highlights. But I think the more common case is highlighting a big field (eg the "body" field)? I would lean towards defaulting this on, and adding setter / or you subclass and override getEmptyHighlight to turn it on. Hmmm maybe getEmptyHighlight should take the field name...
      Hide
      Robert Muir added a comment -

      But I think the more common case is highlighting a big field (eg the
      "body" field)? I would lean towards defaulting this on, and adding
      setter / or you subclass and override getEmptyHighlight to turn it
      on. Hmmm maybe getEmptyHighlight should take the field name...

      I think so too. passing the fieldname would probably help someone do whatever
      crazy thing they want... ultimately this whole highlighter should have just
      been a StoredFieldsVisitor i tell you

      Show
      Robert Muir added a comment - But I think the more common case is highlighting a big field (eg the "body" field)? I would lean towards defaulting this on, and adding setter / or you subclass and override getEmptyHighlight to turn it on. Hmmm maybe getEmptyHighlight should take the field name... I think so too. passing the fieldname would probably help someone do whatever crazy thing they want... ultimately this whole highlighter should have just been a StoredFieldsVisitor i tell you
      Hide
      Michael McCandless added a comment -

      New patch, I think folding in all feedback and fixing the nocommits. I changed how I interact with the BI to just use next()...

      Show
      Michael McCandless added a comment - New patch, I think folding in all feedback and fixing the nocommits. I changed how I interact with the BI to just use next()...
      Hide
      Robert Muir added a comment -

      +1, very nice. Especially like all the new tests.

      I trust the dead code is really dead, I don't know why this was there.
      Maybe something from a past iteration/hair-pulling session...

      Show
      Robert Muir added a comment - +1, very nice. Especially like all the new tests. I trust the dead code is really dead, I don't know why this was there. Maybe something from a past iteration/hair-pulling session...
      Hide
      Commit Tag Bot added a comment -

      [branch_4x commit] Michael McCandless
      http://svn.apache.org/viewvc?view=revision&revision=1458870

      LUCENE-4856: revert ... need to add option to Solr

      Show
      Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1458870 LUCENE-4856 : revert ... need to add option to Solr
      Hide
      Commit Tag Bot added a comment -

      [branch_4x commit] Michael McCandless
      http://svn.apache.org/viewvc?view=revision&revision=1458867

      LUCENE-4856: If there are no matches for a given field, return the first maxPassages sentences

      Show
      Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1458867 LUCENE-4856 : If there are no matches for a given field, return the first maxPassages sentences
      Hide
      Commit Tag Bot added a comment -

      [trunk commit] Michael McCandless
      http://svn.apache.org/viewvc?view=revision&revision=1458869

      LUCENE-4856: revert ... need to add option to Solr

      Show
      Commit Tag Bot added a comment - [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1458869 LUCENE-4856 : revert ... need to add option to Solr
      Hide
      Commit Tag Bot added a comment -

      [trunk commit] Michael McCandless
      http://svn.apache.org/viewvc?view=revision&revision=1458865

      LUCENE-4856: If there are no matches for a given field, return the first maxPassages sentences

      Show
      Commit Tag Bot added a comment - [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1458865 LUCENE-4856 : If there are no matches for a given field, return the first maxPassages sentences
      Hide
      Commit Tag Bot added a comment -

      [branch_4x commit] Michael McCandless
      http://svn.apache.org/viewvc?view=revision&revision=1458892

      LUCENE-4856: If there are no matches for a given field, return the first maxPassages sentences

      Show
      Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1458892 LUCENE-4856 : If there are no matches for a given field, return the first maxPassages sentences
      Hide
      Commit Tag Bot added a comment -

      [trunk commit] Michael McCandless
      http://svn.apache.org/viewvc?view=revision&revision=1458889

      LUCENE-4856: If there are no matches for a given field, return the first maxPassages sentences

      Show
      Commit Tag Bot added a comment - [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1458889 LUCENE-4856 : If there are no matches for a given field, return the first maxPassages sentences
      Hide
      Uwe Schindler added a comment -

      Closed after release.

      Show
      Uwe Schindler added a comment - Closed after release.

        People

        • Assignee:
          Michael McCandless
          Reporter:
          Michael McCandless
        • Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development