Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1, 3.2, 3.3, 3.4, 4.0-ALPHA
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: highlighter
    • Labels:
      None

      Description

      After LUCENE-1824 committed, Solr FragmentsBuilder can snip off at the "natural" boundary by nature. But to bring out the full feature, Solr should take care of arbitrary BoundaryScanner in solrconfig.xml.

      1. SOLR-2749.patch
        24 kB
        Koji Sekiguchi
      2. SOLR-2749.patch
        20 kB
        Koji Sekiguchi
      3. SOLR-2749.patch
        15 kB
        Koji Sekiguchi

        Issue Links

          Activity

          Hide
          Koji Sekiguchi added a comment -

          Draft, halfway patch.

          Show
          Koji Sekiguchi added a comment - Draft, halfway patch.
          Hide
          Koji Sekiguchi added a comment -

          New patch. Almost done except test cases.

          Show
          Koji Sekiguchi added a comment - New patch. Almost done except test cases.
          Hide
          Koji Sekiguchi added a comment -

          New patch. I added test case. Will commit tonight.

          Show
          Koji Sekiguchi added a comment - New patch. I added test case. Will commit tonight.
          Hide
          Koji Sekiguchi added a comment -

          trunk: Committed revision 1170616.
          3x: Committed revision 1170620.

          Show
          Koji Sekiguchi added a comment - trunk: Committed revision 1170616. 3x: Committed revision 1170620.
          Hide
          Uwe Schindler added a comment -

          Bulk close after 3.5 is released

          Show
          Uwe Schindler added a comment - Bulk close after 3.5 is released
          Hide
          Mike added a comment -

          Hi, I just installed trunk, and I'm still seeing this problem. Not sure how to provide a STR, but in snippets I'm seeing words cut in the middle rather than at their boundaries.

          If it's useful, the query I'm sending is:

          INFO: [] webapp=/solr path=/select/ params={
          sort=score+asc&
          fl=id,absolute_url,court_id,local_path,source,download_url,status,dateFiled&
          hl.fl=text,caseName,westCite,docketNumber,lexisCite,court_citation_string&
          f.text.hl.snippets=5&
          hl=true&
          q=willingness&
          fq=dateFiled:{TO}&
          fq=

          {!tag%3Ddt}court_exact"ca5"OR"ca4"OR"ca7"OR"ca1"OR"ca3"OR"ca2"OR"scotus"OR"ca9"OR"ca8"OR"all"OR"ca11"OR"ca10"OR"cadc"OR"cafc")
          fq={!tag%3Ddt}

          status_exact"Non-Precedential"OR"Relating-to+orders"OR"Precedential"OR"Errata")&f.docketNumber.hl.alternateField=docketNumber&
          f.docketNumber.hl.fragListBuilder=single&
          f.lexisCite.hl.fragListBuilder=single&
          f.caseName.hl.fragListBuilder=single&
          f.westCite.hl.fragListBuilder=single&
          f.court_citation_string.hl.fragListBuilder=single&

          f.text.hl.alternateField=text&
          f.caseName.hl.alternateField=caseName&
          f.court_citation_string.hl.alternateField=court_citation_string
          f.lexisCite.hl.alternateField=lexisCite&
          f.westCite.hl.alternateField=westCite&
          f.text.hl.maxAlternateFieldLength=500&
          }

          And I'm getting a snippet that contains:

          ...g and willingness to read with care.” Rosenau v. Unifund Corp., 539 F.3d 218, 221 (3d Cir. 2008) (internal...

          You can see the first and last word are both cut off.

          Show
          Mike added a comment - Hi, I just installed trunk, and I'm still seeing this problem. Not sure how to provide a STR, but in snippets I'm seeing words cut in the middle rather than at their boundaries. If it's useful, the query I'm sending is: INFO: [] webapp=/solr path=/select/ params={ sort=score+asc& fl=id,absolute_url,court_id,local_path,source,download_url,status,dateFiled& hl.fl=text,caseName,westCite,docketNumber,lexisCite,court_citation_string& f.text.hl.snippets=5& hl=true& q=willingness& fq=dateFiled:{ TO }& fq= {!tag%3Ddt}court_exact "ca5" OR "ca4" OR "ca7" OR "ca1" OR "ca3" OR "ca2" OR "scotus" OR "ca9" OR "ca8" OR "all" OR "ca11" OR "ca10" OR "cadc" OR "cafc") fq={!tag%3Ddt} status_exact "Non-Precedential" OR "Relating-to+orders" OR "Precedential" OR "Errata")&f.docketNumber.hl.alternateField=docketNumber& f.docketNumber.hl.fragListBuilder=single& f.lexisCite.hl.fragListBuilder=single& f.caseName.hl.fragListBuilder=single& f.westCite.hl.fragListBuilder=single& f.court_citation_string.hl.fragListBuilder=single& f.text.hl.alternateField=text& f.caseName.hl.alternateField=caseName& f.court_citation_string.hl.alternateField=court_citation_string f.lexisCite.hl.alternateField=lexisCite& f.westCite.hl.alternateField=westCite& f.text.hl.maxAlternateFieldLength=500& } And I'm getting a snippet that contains: ...g and willingness to read with care.” Rosenau v. Unifund Corp., 539 F.3d 218, 221 (3d Cir. 2008) (internal... You can see the first and last word are both cut off.
          Hide
          Koji Sekiguchi added a comment -

          Are you sure that you have boundaryScanner tag in your solrconfig.xml?

          <boundaryScanner name="default"
                           default="true"
                           class="solr.highlight.SimpleBoundaryScanner">
            <lst name="defaults">
              <str name="hl.bs.maxScan">10</str>
              <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
            </lst>
          </boundaryScanner>
          
          Show
          Koji Sekiguchi added a comment - Are you sure that you have boundaryScanner tag in your solrconfig.xml? <boundaryScanner name= " default " default = " true " class= "solr.highlight.SimpleBoundaryScanner" > <lst name= "defaults" > <str name= "hl.bs.maxScan" >10</str> <str name= "hl.bs.chars" >.,!? &#9;&#10;&#13;</str> </lst> </boundaryScanner>
          Hide
          Mike added a comment -

          Oops. I suppose that would do it, huh? Is there a reason why this isn't in the config by default? Seems like one more place for a newbie to Solr (like myself) to miss a useful feature.

          I'll update the wiki as well to make it clear what the BoundaryScanners are for and what a valid entry for them looks like.

          Show
          Mike added a comment - Oops. I suppose that would do it, huh? Is there a reason why this isn't in the config by default? Seems like one more place for a newbie to Solr (like myself) to miss a useful feature. I'll update the wiki as well to make it clear what the BoundaryScanners are for and what a valid entry for them looks like.

            People

            • Assignee:
              Koji Sekiguchi
              Reporter:
              Koji Sekiguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development