Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1550

xercesImpl and xmlParserAPIs (org.apache.xml) packages and classes only used in three Nutch classes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • 1.6, 2.1
    • 1.7, 2.2
    • build, parser
    • None

    Description

      DOMSerializerImpl from xerces is deprecated in our current artifact. It is replaced by the (still ancient but slightly newer org.apache.xml.serializer.dom3.LSSerializerImpl in [0]).
      Upon closer inspection it seems that find . | xargs grep "org.apache.xml" * only pulled up DOMBuilder, XMLChatacterRecognizer and DOMContentUtilsTest as the places where such classes are used.
      I am confused as to why they are included as primary dependencies within Nutch. Either these XML specific dependencies should be restricted dependencies to parse-html or else they should be removed and replaced by the new artifact [0].
      [0] http://search.maven.org/#artifactdetails|xalan|serializer|2.7.1|jar

      Attachments

        Activity

          People

            lewismc Lewis John McGibbney
            lewismc Lewis John McGibbney
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: