Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1172

blank nodes can break jena-text

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 3.0.1
    • Jena 3.1.0
    • Text
    • None

    Description

      Data with blank node subjects can break the jena-text index.

      For this example I use a typical jena-text configuration which indexes rdfs:label. Then I add this triple:

      _:b0 <http://www.w3.org/2000/01/rdf-schema#label> "blank" .
      

      There is no error (though I remember seeing WARNINGs in other situations like this) and the triple gets indexed.

      When I later execute this query:

      PREFIX text: <http://jena.apache.org/text#>
      SELECT ?s { ?s text:query 'blank' }
      

      I get this error:

      10:22:38 WARN  [5] RC = 500 : java.lang.UnsupportedOperationException: 3ed87b7f14f612ef53788d889f6410d6 is not a URI node
      org.apache.jena.ext.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.UnsupportedOperationException: 3ed87b7f14f612ef53788d889f6410d6 is not a URI node
      	at org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
      	at org.apache.jena.ext.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
      	at org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
      	at org.apache.jena.atlas.lib.cache.CacheGuava.getOrFill(CacheGuava.java:58)
      	at org.apache.jena.query.text.TextQueryPF.query(TextQueryPF.java:291)
      	at org.apache.jena.query.text.TextQueryPF.variableSubject(TextQueryPF.java:229)
      	at org.apache.jena.query.text.TextQueryPF.exec(TextQueryPF.java:198)
      	at org.apache.jena.sparql.pfunction.PropertyFunctionBase$RepeatApplyIteratorPF.nextStage(PropertyFunctionBase.java:106)
      

      Note that this happens any time the jena-text query happens to match a blank node subject. So a single triple with a blank node subject can "taint" the whole index. This is what happens with LCSH, which for whatever reason happens to contain a few hundred blank nodes that have a skos:prefLabel property (among almost 8M triples that generally use URIs for everything).

      Attachments

        Activity

          People

            osma Osma Suominen
            osma Osma Suominen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: