Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-394

RDF/XML parser incorrectly disallows some Unicode characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • Jena 2.10.0
    • None
    • RDF/XML
    • None

    Description

      The Unicode character 'KATAKANA MIDDLE DOT' (U+30FB) in the local part of a property name causes a parse exception in the RDF/XML parser. This seems to be incorrect, as the character is allowed in IRIs and is allowed in XML local names, as far as I can tell.

      Example file:

      <?xml version="1.0" encoding="utf-8" ?>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://example.com/ns#">
      <rdf:Description rdf:about="#this">
      <隣接自治体・行政区 rdf:resource="#that"/>
      </rdf:Description>
      </rdf:RDF>

      The offending character is the “dot” in the middle of the property name.

      rdfcat execution with stack trace:

      $ bin/rdfcat ~/katakana-middle-dot.xml
      18:09:37 ERROR riot :: Element type "?????" must be followed by either attribute specifications, ">" or "/>".
      Exception in thread "main" org.apache.jena.riot.RiotException: Element type "?????" must be followed by either attribute specifications, ">" or "/>".
      at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:132)
      at org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.fatalError(LangRDFXML.java:242)
      at com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:48)
      at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:209)
      at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:239)
      at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
      at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
      at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
      at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
      at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
      at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
      at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
      at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
      at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
      at com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:151)
      at com.hp.hpl.jena.rdf.arp.ARP.load(ARP.java:119)
      at org.apache.jena.riot.lang.LangRDFXML.parse(LangRDFXML.java:141)
      at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTFactoryImpl$1.read(RDFParserRegistry.java:148)
      at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:749)
      at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:258)
      at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:244)
      at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:65)
      at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:276)
      at com.hp.hpl.jena.util.FileManager.readModelWorker(FileManager.java:403)
      at com.hp.hpl.jena.util.FileManager.readModel(FileManager.java:342)
      at jena.rdfcat.readInput(rdfcat.java:375)
      at jena.rdfcat$ReadAction.run(rdfcat.java:552)
      at jena.rdfcat.go(rdfcat.java:278)
      at jena.rdfcat.main(rdfcat.java:260)

      Attachments

        1. japanese-chars.xml
          0.3 kB
          Richard Cyganiak
        2. katakana-middle-dot.xml
          0.3 kB
          Richard Cyganiak

        Activity

          People

            andy Andy Seaborne
            cygri Richard Cyganiak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: