Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
4.0.0
-
None
-
None
Description
I am attempting to ingest the latest DBpedia dataset. Rya is erroring out whenever it hits a URI with a hostname that begins with a number. I'm not sure if the problem is in Rya itself or in RDF4J.
2020-05-28 00:53:07,971 ERROR [main -- parser thread] org.apache.rya.accumulo.mr.RdfFileInputFormat: Invalid IRI 'https://9p.io/plan9 [line 36207]
org.eclipse.rdf4j.rio.RDFParseException: Invalid IRI 'https://9p.io/plan9 [line 36207]
at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportError(RDFParserHelper.java:322)
at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportError(AbstractRDFParser.java:684)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.reportError(TurtleParser.java:1309)
at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.resolveURI(AbstractRDFParser.java:387)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseURI(TurtleParser.java:941)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseValue(TurtleParser.java:588)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObject(TurtleParser.java:474)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:412)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:385)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:372)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:239)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:201)
at org.apache.rya.accumulo.mr.RdfFileInputFormat$RdfFileRecordReader$2.run(RdfFileInputFormat.java:275)
2020-05-28 00:53:07,972 ERROR [main -- parser thread] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[main -- parser thread,5,main] threw an Exception.
java.lang.RuntimeException: Invalid IRI 'https://9p.io/plan9 [line 36207]
at org.apache.rya.accumulo.mr.RdfFileInputFormat$RdfFileRecordReader$2.run(RdfFileInputFormat.java:280)
Caused by: org.eclipse.rdf4j.rio.RDFParseException: Invalid IRI 'https://9p.io/plan9 [line 36207]
at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportError(RDFParserHelper.java:322)
at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportError(AbstractRDFParser.java:684)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.reportError(TurtleParser.java:1309)
at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.resolveURI(AbstractRDFParser.java:387)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseURI(TurtleParser.java:941)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseValue(TurtleParser.java:588)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObject(TurtleParser.java:474)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:412)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:385)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:372)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:239)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:201)
at org.apache.rya.accumulo.mr.RdfFileInputFormat$RdfFileRecordReader$2.run(RdfFileInputFormat.java:275)
2020-05-28 00:53:07,972 ERROR [main -- reader thread] org.apache.rya.accumulo.mr.RdfFileInputFormat: Error processing line 38462 of input
java.io.InterruptedIOException
at java.io.PipedReader.receive(PipedReader.java:187)
at java.io.PipedReader.receive(PipedReader.java:206)
at java.io.PipedWriter.write(PipedWriter.java:150)
at java.io.Writer.write(Writer.java:192)
at java.io.Writer.write(Writer.java:157)
at org.apache.rya.accumulo.mr.RdfFileInputFormat$RdfFileRecordReader$1.run(RdfFileInputFormat.java:249)