Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-975

LinkBuilder to optionally collapse anchor whitespace

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.2
    • 1.3
    • parser
    • None

    Description

      Links extracted by the LinkContentHandler contain the verbatim anchor text. This is usually fine but unfortunately many websites have the anchor text spread over multiple lines or have it indented with tabulators or spaces.

      This patch adds a boolean option to LinkContentHandler with which whitespace collapsing can be toggled on or off. Default behaviour remains as-is and the API remains backward compatible.

      Attachments

        1. TIKA-975-1.3-1.patch
          2 kB
          Markus Jelsma
        2. TIKA-975-1.3-2.patch
          4 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              kkrugler Kenneth William Krugler
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: