Affects Version/s: Jena 3.14.0
Fix Version/s: Jena 3.15.0
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
I have a web app running SPARQL CONSTRUCT queries against Fuseki and generating web pages. I noticed that Fuseki started hogging all CPU cores a few hours after it was restarted. It turned out that some of the CONSTRUCT queries take a very long time to complete - at least 40 minutes but probably more and it seems quite likely they will never finish.
I was able to turn this into a fairly minimal example. I've attached a 1.3MB Turtle file (~29k triples) with all the data necessary to demonstrate the problem.
Start Fuseki like this: ./fuseki-server --file W00067442800.ttl /ds
Then open the Fuseki web UI and run this SPARQL query against the dataset:
If you select Turtle as the content type, the query will finish in around 3 seconds (plus rendering the result in the browser takes a while). If instead you select XML as the format, the query will just keep running, with Fuseki taking over a single CPU core completely. With several such queries running, all the CPU cores will eventually be used.
This can also be demonstrated using curl (with the above query saved as query.rq):
works fine and gives you the Turtle output;
never seems to finish.
What's perhaps even worse, even a query timeout setting doesn't help. If I start Fuseki with a 10 second query timeout, i.e. --timeout 10000, it still won't stop the query from hogging the CPU forever. I'm guessing that the problem is in the final stages of the query processing, when the results just have to be serialized into the correct syntax, and the timeout is no longer applied in this stage.
I discovered this problem while running Fuseki 3.5.0, but it happens with the most recent release 3.14.0 as well.