Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-417

Inherent problems with mimetype detection

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3
    • 2.8
    • mime
    • None

    Description

      N-Triples is a subset of Turtle, and it is also a subset of N-Quads. Turtle is a subset of TriG.

      But when we are performing mimetype detection on a plain text file, we only sniff the first few kilobytes of data. Therefore, something we initially detect as N-Triples may in fact be a Turtle, Trig, or NQuads document. Something we initially detect as Turtle may in fact be a TriG document.

      Therefore, if we detect that the document is Turtle, in the absence of a declared Content-Type, we should probably assume that it actually TriG, just in case.

      If we can only detect that the document is N-Triples, that presents a problem, because it could also be either Turtle or N-Quads. Which do we choose?

      Another problem I see is that we are detecting both N3 and Turtle in two separate steps. However, as I understand it, for the purposes of RDF, N3 is essentially a synonym for Turtle. So it doesn't really make sense to use two different detection steps for this. It appears that our N3 detection step is actually detecting N-Triples, which is not at all the same thing.

      (Indeed, in org.eclipse.rdf4j.rio.n3.N3ParserFactory's implementation of getParser() we see: return new TurtleParser())

      Attachments

        Activity

          People

            Unassigned Unassigned
            hansbrende Hans Brende
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: