Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-882

HTMLStripReader improvement - padding corrected for hexadecimal entities, option not to emit padding at all added

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 3.6, 4.0-ALPHA
    • None
    • None

    Description

      Improvements to HTMLStripHighlighter:

      • fix padding of hexadecimal entities (currently off by 1)
      • add an option not to emit padding at all. In certain applications padding emitted after entities such as ó may split words that are in fact single terms.
      • add entities that are recognized when written all in uppercase and recognized by browsers.

      Attachments

        1. patch
          18 kB
          Dawid Weiss

        Issue Links

          Activity

            People

              sarowe Steven Rowe
              dawidweiss Dawid Weiss
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: