Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5
    • Component/s: parser
    • Labels:
      None

      Description

      The NekoHTML library we currently use for parsing HTML has a transitive dependency on Apache Xerces. The Xerces library is pretty big (1.2MB) and is known to cause various problems when included in the classpath of an application or a container that expects some other XML parser library.

      The TagSoup library (http://home.ccil.org/~cowan/XML/tagsoup/) provides an alternative HTML parsing library that works pretty much like NekoHTML but doesn't depend on Xerces. I suggest we switch from NekoHTML to TagSoup unless this change causes major regressions in HTML parsing.

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Jukka Zitting
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development