Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1500

FeedParser extracts XML markup with BodyContentHandler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6
    • 1.7
    • parser
    • None

    Description

      I am using FeedParser to extract text and links from feeds and have discovered, that the extracted text contains XML markup.
      Usually FeedParser strips markup from text when generating SAX events,
      but one line is missing it.
      The fix is trivial. I will provide a patch.

      Attachments

        1. TIKA-1500.patch
          0.8 kB
          Reinhard Pötz

        Activity

          People

            tpalsulich Tyler Bui-Palsulich
            reinhard Reinhard Pötz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: