Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-379

RDFa SAXParseException: invalid XML character

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3
    • 2.3
    • extractors
    • None

    Description

      When browsing the page http://www.bray-sur-seine.fr/les-gagnants-du-concours-de-bd/ I encountered the following exception:

      org.apache.any23.extractor.ExtractionException: Error while parsing RDF document.
      	at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:175)
      	at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:57)
      	at org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:471)
      	at org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:259)
      	at org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:323)
      	at org.apache.any23.extractor.html.AbstractExtractorTestCase.extract(AbstractExtractorTestCase.java:189)
      	at org.apache.any23.extractor.html.AbstractExtractorTestCase.assertExtract(AbstractExtractorTestCase.java:204)
      	... 28 more
      Caused by: org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; lineNumber: 205; columnNumber: 52; An invalid XML character (Unicode: 0x8) was found in the element content of the document.
      	at org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111)
      	at org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95)
      	at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:171)
      	... 34 more
      Caused by: org.semarglproject.rdf.ParseException: org.xml.sax.SAXParseException; lineNumber: 205; columnNumber: 52; An invalid XML character (Unicode: 0x8) was found in the element content of the document.
      	at org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141)
      	at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
      	at org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
      	at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
      	at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
      	at org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109)
      	... 36 more
      Caused by: org.xml.sax.SAXParseException; lineNumber: 205; columnNumber: 52; An invalid XML character (Unicode: 0x8) was found in the element content of the document.
      	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
      	at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
      	... 40 more
      

      Attachments

        Issue Links

          Activity

            People

              hansbrende Hans Brende
              hansbrende Hans Brende
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment