Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1, 3.2, 3.3, 3.4, 4.0-ALPHA
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: highlighter
    • Labels:
      None

      Description

      After LUCENE-1824 committed, Solr FragmentsBuilder can snip off at the "natural" boundary by nature. But to bring out the full feature, Solr should take care of arbitrary BoundaryScanner in solrconfig.xml.

      1. SOLR-2749.patch
        24 kB
        Koji Sekiguchi
      2. SOLR-2749.patch
        20 kB
        Koji Sekiguchi
      3. SOLR-2749.patch
        15 kB
        Koji Sekiguchi

        Issue Links

          Activity

          Gavin made changes -
          Link This issue depends upon LUCENE-1824 [ LUCENE-1824 ]
          Gavin made changes -
          Link This issue depends on LUCENE-1824 [ LUCENE-1824 ]
          Hide
          Mike added a comment -

          Oops. I suppose that would do it, huh? Is there a reason why this isn't in the config by default? Seems like one more place for a newbie to Solr (like myself) to miss a useful feature.

          I'll update the wiki as well to make it clear what the BoundaryScanners are for and what a valid entry for them looks like.

          Show
          Mike added a comment - Oops. I suppose that would do it, huh? Is there a reason why this isn't in the config by default? Seems like one more place for a newbie to Solr (like myself) to miss a useful feature. I'll update the wiki as well to make it clear what the BoundaryScanners are for and what a valid entry for them looks like.
          Hide
          Koji Sekiguchi added a comment -

          Are you sure that you have boundaryScanner tag in your solrconfig.xml?

          <boundaryScanner name="default"
                           default="true"
                           class="solr.highlight.SimpleBoundaryScanner">
            <lst name="defaults">
              <str name="hl.bs.maxScan">10</str>
              <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
            </lst>
          </boundaryScanner>
          
          Show
          Koji Sekiguchi added a comment - Are you sure that you have boundaryScanner tag in your solrconfig.xml? <boundaryScanner name= " default " default = " true " class= "solr.highlight.SimpleBoundaryScanner" > <lst name= "defaults" > <str name= "hl.bs.maxScan" >10</str> <str name= "hl.bs.chars" >.,!? &#9;&#10;&#13;</str> </lst> </boundaryScanner>
          Hide
          Mike added a comment -

          Hi, I just installed trunk, and I'm still seeing this problem. Not sure how to provide a STR, but in snippets I'm seeing words cut in the middle rather than at their boundaries.

          If it's useful, the query I'm sending is:

          INFO: [] webapp=/solr path=/select/ params={
          sort=score+asc&
          fl=id,absolute_url,court_id,local_path,source,download_url,status,dateFiled&
          hl.fl=text,caseName,westCite,docketNumber,lexisCite,court_citation_string&
          f.text.hl.snippets=5&
          hl=true&
          q=willingness&
          fq=dateFiled:{TO}&
          fq=

          {!tag%3Ddt}court_exact"ca5"OR"ca4"OR"ca7"OR"ca1"OR"ca3"OR"ca2"OR"scotus"OR"ca9"OR"ca8"OR"all"OR"ca11"OR"ca10"OR"cadc"OR"cafc")
          fq={!tag%3Ddt}

          status_exact"Non-Precedential"OR"Relating-to+orders"OR"Precedential"OR"Errata")&f.docketNumber.hl.alternateField=docketNumber&
          f.docketNumber.hl.fragListBuilder=single&
          f.lexisCite.hl.fragListBuilder=single&
          f.caseName.hl.fragListBuilder=single&
          f.westCite.hl.fragListBuilder=single&
          f.court_citation_string.hl.fragListBuilder=single&

          f.text.hl.alternateField=text&
          f.caseName.hl.alternateField=caseName&
          f.court_citation_string.hl.alternateField=court_citation_string
          f.lexisCite.hl.alternateField=lexisCite&
          f.westCite.hl.alternateField=westCite&
          f.text.hl.maxAlternateFieldLength=500&
          }

          And I'm getting a snippet that contains:

          ...g and willingness to read with care.” Rosenau v. Unifund Corp., 539 F.3d 218, 221 (3d Cir. 2008) (internal...

          You can see the first and last word are both cut off.

          Show
          Mike added a comment - Hi, I just installed trunk, and I'm still seeing this problem. Not sure how to provide a STR, but in snippets I'm seeing words cut in the middle rather than at their boundaries. If it's useful, the query I'm sending is: INFO: [] webapp=/solr path=/select/ params={ sort=score+asc& fl=id,absolute_url,court_id,local_path,source,download_url,status,dateFiled& hl.fl=text,caseName,westCite,docketNumber,lexisCite,court_citation_string& f.text.hl.snippets=5& hl=true& q=willingness& fq=dateFiled:{ TO }& fq= {!tag%3Ddt}court_exact "ca5" OR "ca4" OR "ca7" OR "ca1" OR "ca3" OR "ca2" OR "scotus" OR "ca9" OR "ca8" OR "all" OR "ca11" OR "ca10" OR "cadc" OR "cafc") fq={!tag%3Ddt} status_exact "Non-Precedential" OR "Relating-to+orders" OR "Precedential" OR "Errata")&f.docketNumber.hl.alternateField=docketNumber& f.docketNumber.hl.fragListBuilder=single& f.lexisCite.hl.fragListBuilder=single& f.caseName.hl.fragListBuilder=single& f.westCite.hl.fragListBuilder=single& f.court_citation_string.hl.fragListBuilder=single& f.text.hl.alternateField=text& f.caseName.hl.alternateField=caseName& f.court_citation_string.hl.alternateField=court_citation_string f.lexisCite.hl.alternateField=lexisCite& f.westCite.hl.alternateField=westCite& f.text.hl.maxAlternateFieldLength=500& } And I'm getting a snippet that contains: ...g and willingness to read with care.” Rosenau v. Unifund Corp., 539 F.3d 218, 221 (3d Cir. 2008) (internal... You can see the first and last word are both cut off.
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Uwe Schindler added a comment -

          Bulk close after 3.5 is released

          Show
          Uwe Schindler added a comment - Bulk close after 3.5 is released
          Koji Sekiguchi made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Koji Sekiguchi added a comment -

          trunk: Committed revision 1170616.
          3x: Committed revision 1170620.

          Show
          Koji Sekiguchi added a comment - trunk: Committed revision 1170616. 3x: Committed revision 1170620.
          Koji Sekiguchi made changes -
          Assignee Koji Sekiguchi [ koji ]
          Fix Version/s 3.5 [ 12317876 ]
          Fix Version/s 4.0 [ 12314992 ]
          Affects Version/s 3.3 [ 12316471 ]
          Affects Version/s 3.2 [ 12316172 ]
          Affects Version/s 3.1 [ 12314371 ]
          Affects Version/s 3.4 [ 12316683 ]
          Affects Version/s 4.0 [ 12314992 ]
          Koji Sekiguchi made changes -
          Attachment SOLR-2749.patch [ 12494364 ]
          Hide
          Koji Sekiguchi added a comment -

          New patch. I added test case. Will commit tonight.

          Show
          Koji Sekiguchi added a comment - New patch. I added test case. Will commit tonight.
          Koji Sekiguchi made changes -
          Attachment SOLR-2749.patch [ 12494174 ]
          Hide
          Koji Sekiguchi added a comment -

          New patch. Almost done except test cases.

          Show
          Koji Sekiguchi added a comment - New patch. Almost done except test cases.
          Koji Sekiguchi made changes -
          Attachment SOLR-2749.patch [ 12494009 ]
          Hide
          Koji Sekiguchi added a comment -

          Draft, halfway patch.

          Show
          Koji Sekiguchi added a comment - Draft, halfway patch.
          Koji Sekiguchi made changes -
          Field Original Value New Value
          Link This issue depends on LUCENE-1824 [ LUCENE-1824 ]
          Koji Sekiguchi created issue -

            People

            • Assignee:
              Koji Sekiguchi
              Reporter:
              Koji Sekiguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development