Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-324

Replace net.sourceforge.nekohtml with jsoup

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.2
    • core
    • None

    Description

      A long standing issue relates to the performance of the existing default TagSoupParser.java. There are a number of issues which now relate to limitations in the way nekohtml parses HTML5 for example ANY23-317, ANY23-273, ANY23-267... there are several others.
      I propose to @Deprecate the TagSoupParser.java implementation for the next release (possibly making it configurable via default-configuration.properties). I also propose to replace it with https://jsoup.org/. AFAIK, Apache Tika also did this several years ago.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lewismc Lewis John McGibbney
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: