Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1140

Jena 3.0.1 model halts reading large rdf file partway through

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Jena 3.0.1
    • Fix Version/s: Jena 3.1.0
    • Component/s: Jena, RDF API
    • Labels:
    • Environment:

      Eclipse on Windows 7, 8, Mac

      Description

      The progress halts, or becomes slow to the point where progress is unnoticable, without execution stopping or crashing, when attempting to read a large (~250MB) turtle rdf file into a Jena model, created with org.apache.jena.rdf.model.ModelFactory.createDefaultModel(), using org.apache.jena.rdf.model.Model's read() method (tested with both the methods using String url and InputStream in).

      The progress will continue until the process uses 1-1.5GB RAM, and progress halts, but execution neither stops nor crashes. The code on the bottom displays the behaviour with a progress bar for the file being read.

      This has been the case for my laptop running Windows 10 using
      Eclipse
      Version: Mars.1 Release (4.5.1)
      Build id: 20150924-1200

      My desktop running Windows 7 using
      Eclipse
      Version: Kepler Service Release 2
      Build id: 20140224-627

      My professor's Mac using Eclipse, however I don't know which versions.

      All three systems were employing Apache Jena 3.0.1, and all of them experienced the same issue.

      I have attempted to manually set the max heap size of the JVM by using the -Xmx3G, however the result did not change.

      Employing Apache Jena Version 2.7.4, and using the same resources in the com.hp.hpl package instead of org.apache fixed the problem on all three systems.

      Here is the java test code:

      ReadLotsOfRDF.java
      import java.io.BufferedInputStream;
      import java.io.FileInputStream;
      
      import com.hp.hpl.jena.rdf.model.Model;
      import com.hp.hpl.jena.rdf.model.ModelFactory;
      
      import javax.swing.JFrame;
      import javax.swing.ProgressMonitorInputStream;
      
      public class ReadLotsOfRDF {
      
      	public static void main(String[] args) throws java.io.IOException {
      		// create a test frame with a "press me" button
      		final JFrame f = new JFrame("Sample");
      
      
      		Model m = ModelFactory.createDefaultModel();
      		m.read(new BufferedInputStream(
      				new ProgressMonitorInputStream(f,"Progress",
      						new FileInputStream("LSQ-BM.ttl"))), null, "TTL");
      		System.out.println(m.size());
      
      	}
      
      }
      

      The "LSQ-BM.ttl" file can be (and was) retrieved from here.

        Attachments

        1. data-500K.nt.gz
          4.05 MB
          Andy Seaborne

          Activity

            People

            • Assignee:
              andy Andy Seaborne
              Reporter:
              Manadron HÃ¥vard Wanvik Stenersen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: