Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1918

Bad performance of path sequence and path*

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Jena 3.15.0
    • Fix Version/s: Jena 3.16.0
    • Component/s: Jena
    • Labels:
      None

      Description

      I want to execute the following SPARQL against my local Apache Jena (with preloaded Wikidata dump using TDB2):

      PREFIX wd: <http://www.wikidata.org/entity/>
      PREFIX wdt: <http://www.wikidata.org/prop/direct/>
      PREFIX wikibase: <http://wikiba.se/ontology#>
      PREFIX p: <http://www.wikidata.org/prop/>
      PREFIX ps: <http://www.wikidata.org/prop/statement/>
      PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
      SELECT ?item ?outflow ?drainageBasin ?coordinates ?elevation ?country
       
       WHERE {
       ?item wdt:P31/wdt:P279* wd:Q23397.
       
       OPTIONAL { ?item wdt:P201 ?outflow. }
       OPTIONAL { ?item wdt:P4614 ?drainageBasin. }
       OPTIONAL { ?item wdt:P625 ?coordinates. }
       OPTIONAL { ?item wdt:P2044 ?elevation. }
       OPTIONAL { ?item wdt:P17 ?country. }
       }
       
       ORDER BY ?item LIMIT 1 OFFSET 0
      

      When run on query.wikidata.org (which uses Blazegraph), this query takes 26 seconds to complete. Other queries run in about the same time as on query.wikidata.org.

      Apache Jena runs for several hours, using one CPU core and 3-4 GB of memory. Then it runs into some timeout (the timeout might be increased, but that's not the issue here).

      My question is, why is this so much slower than Blazegraph? Can this SPARQL be optimized to get a better performance? Can the query optimizer be tweaked to run this more efficiently?

      If not, then I consider this a bug, because the query itself should not generate such a big workload. If the query optimizer runs the

      wdt:P31/wdt:P279*

      predicate first, then limits it via the

      ORDER BY ?item LIMIT 1 OFFSET 0

      clause, there would be just one item for which it needs to execute the

      OPTIONAL { ?item ... }

      joins.

        Attachments

          Activity

            People

            • Assignee:
              andy Andy Seaborne
              Reporter:
              yolpsoftware Jonas Sourlier
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m