Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-813

In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 2.12.1
    • Jena 2.13.0
    • ARQ
    • None

    Description

      Discovered this accidentally in the course of debugging a test case which seemed to running exceptionally slow. Basically the in-memory DatasetGraph implementation will perform extremely poorly when trying to iterate all quads it contains.

      This problem manifests at relatively small scales and shows a marked difference between small numbers of quads in a single graph versus the quads in individual graphs. In my testing with only a few thousand quads placing them in a single graph versus many individual graphs can see 5/6 orders of magnitude performance difference.

      I will attach code to reproduce and show my log output from running this on my system (Mac OS X Mavericks - Quad Core - 8GB RAM):

      0 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 1000 quads
      57 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      90 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1 graphs
      123 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.021191 seconds to iterate over 1000 quads in 1 graphs
      123 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      144 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1000 graphs
      861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.716750 seconds to iterate over 1000 quads in 1000 graphs
      861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 2000 quads
      861 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      885 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 1 graphs
      890 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.005400 seconds to iterate over 2000 quads in 1 graphs
      890 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      922 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 2000 graphs
      7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 7.002919 seconds to iterate over 2000 quads in 2000 graphs
      7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 3000 quads
      7925 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      7944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 1 graphs
      7948 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.003924 seconds to iterate over 3000 quads in 1 graphs
      7948 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      7973 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 3000 graphs
      36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 28.749688 seconds to iterate over 3000 quads in 3000 graphs
      36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 4000 quads
      36723 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      36727 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 1 graphs
      36730 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000773 seconds to iterate over 4000 quads in 1 graphs
      36730 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      36741 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 4000 graphs
      111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 74.896949 seconds to iterate over 4000 quads in 4000 graphs
      111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 5000 quads
      111638 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      111644 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 1 graphs
      111645 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000989 seconds to iterate over 5000 quads in 1 graphs
      111645 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
      111655 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 5000 graphs
      285026 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 173.370614 seconds to iterate over 5000 quads in 5000 graphs
      

      From debugging my best guess is this is down to how DatasetGraphCollection (and it's hierarchy) build the quads iterator by iterating over the graph names and then iterating over the individual graphs appending the iterators together. Something in the structuring means accessing and appending all those individual iterators is drastically slow compared to iterating a single graph.

      A related bug is that given a sufficiently large number of graphs attempts to iterate will fail with a StackOverflowError, the culprit in this case is IteratorCons.

      Attachments

        1. SlowDatasetGraph.java
          3 kB
          Rob Vesse
        2. SlowDatasetGraph.java
          3 kB
          Rob Vesse

        Activity

          People

            rvesse Rob Vesse
            rvesse Rob Vesse
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: