Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1826

Fuseki RDF/XML response never finishes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 3.14.0
    • Jena 3.15.0
    • Fuseki
    • None
    • Ubuntu 16.04
      java version "1.8.0_201"
      Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
      Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

    Description

      I have a web app running SPARQL CONSTRUCT queries against Fuseki and generating web pages. I noticed that Fuseki started hogging all CPU cores a few hours after it was restarted. It turned out that some of the CONSTRUCT queries take a very long time to complete - at least 40 minutes but probably more and it seems quite likely they will never finish.

      I was able to turn this into a fairly minimal example. I've attached a 1.3MB Turtle file (~29k triples) with all the data necessary to demonstrate the problem.

      Start Fuseki like this: ./fuseki-server --file W00067442800.ttl /ds

      Then open the Fuseki web UI and run this SPARQL query against the dataset:

      PREFIX schema: <http://schema.org/>       
      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>                   
      CONSTRUCT {
        <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
        ?o schema:name ?oname ;
          skos:prefLabel ?olabel .
        ?inst ?instprop ?instval .
        ?instval schema:name ?instvalName ;
          skos:prefLabel ?instvalLabel .
      }
      WHERE {
        {
          <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
          OPTIONAL {
            {
              ?o schema:name ?oname 
            }             UNION             {
              ?o skos:prefLabel ?olabel 
            }           
          }         
        }         UNION         {
          {
            <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> schema:workExample ?inst 
          }           OPTIONAL {
            {
              ?inst ?instprop ?instval .
              OPTIONAL {
                {
                  ?instval schema:name ?instvalName 
                }                 UNION                 {
                  ?instval skos:prefLabel ?instvalLabel 
                }               
              }             
            }      
          }         
        }       
      }
      

      If you select Turtle as the content type, the query will finish in around 3 seconds (plus rendering the result in the browser takes a while). If instead you select XML as the format, the query will just keep running, with Fuseki taking over a single CPU core completely. With several such queries running, all the CPU cores will eventually be used.

      This can also be demonstrated using curl (with the above query saved as query.rq):

      curl -H 'Accept: text/turtle' --data-urlencode "query@query.rq" http://localhost:3030/ds/sparql
      

      works fine and gives you the Turtle output;

      curl -H 'Accept: application/rdf+xml' --data-urlencode "query@query.rq" http://localhost:3030/ds/sparql
      

      never seems to finish.

      What's perhaps even worse, even a query timeout setting doesn't help. If I start Fuseki with a 10 second query timeout, i.e. --timeout 10000, it still won't stop the query from hogging the CPU forever. I'm guessing that the problem is in the final stages of the query processing, when the results just have to be serialized into the correct syntax, and the timeout is no longer applied in this stage.

      I discovered this problem while running Fuseki 3.5.0, but it happens with the most recent release 3.14.0 as well.

      Attachments

        1. data.nt
          277 kB
          Osma Suominen
        2. W00067442800.ttl
          1.30 MB
          Osma Suominen

        Issue Links

          Activity

            People

              osma Osma Suominen
              osma Osma Suominen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h