Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Jena 3.14.0
-
None
Description
The following code for loading 1.000.000 graphs takes 1 minute on my notebook, but I stopped my attempt of writing the data out as trig after several hours.
Dataset ds = RDFDataMgr.loadDataset("test-data.trig"); RDFDataMgr.write(new NullOutputStream(), ds, RDFFormat.TRIG_PRETTY);
In comparison, writing takes 2 seconds for me with RDFFormat.NQUADS.
The test data I used can be generated with this gendata.sh bash script:
#!/bin/bash MAX=${1:-10} echo "@prefix eg: <http://www.example.org/> ." for i in `seq 1 $MAX`; do echo "<urn:graph-$i> { <urn:s-$i> eg:idx $i }" done
Invoke the script with the number of named graphs to generate, in my case I used
./gendata.sh 1000000 > test-data.trig`
With the profiler I could trace the problem to code in TurtleShell.java which repeatedly collects all one million graph names :
this.graphNames = (dsg != null) ? Iter.toSet(dsg.listGraphNodes()) : null ;`