Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-882

HTMLStripReader improvement - padding corrected for hexadecimal entities, option not to emit padding at all added

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Improvements to HTMLStripHighlighter:

      • fix padding of hexadecimal entities (currently off by 1)
      • add an option not to emit padding at all. In certain applications padding emitted after entities such as ó may split words that are in fact single terms.
      • add entities that are recognized when written all in uppercase and recognized by browsers.

        Attachments

        1. patch
          18 kB
          Dawid Weiss

          Issue Links

            Activity

              People

              • Assignee:
                steve_rowe Steve Rowe
                Reporter:
                dawidweiss Dawid Weiss
              • Votes:
                1 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: