Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-12

Turtle Files with a UTF-8 BOM fail to parse

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • Jena 2.11.0
    • RIOT
    • None
    • Windows 7, latest Sun Java Runtime, Jena 2.6.4

    Description

      If a Turtle file has a BOM at the start then Jena will refuse to parse it giving the following error:

      Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at line 1, column 2. Encountered: "@" (64), after : "\ufeff"
      at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
      at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
      at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
      at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
      at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
      at TurtleWithBOM.main(TurtleWithBOM.java:31)

      The code I used to produce this error was as follows:

      import com.hp.hpl.jena.rdf.model.*;
      import com.hp.hpl.jena.util.FileManager;

      import java.io.*;

      public class TurtleWithBOM
      {

      public static void main(String[] args)
      {

      // create an empty model
      Model model = ModelFactory.createDefaultModel();

      InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
      if (in == null)

      { throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found"); }

      // read the Turtle file
      model.read(in, "", "TTL");

      // write it to standard out
      model.write(System.out);
      }
      }

      A sample Turtle file used with the above code is attached to this issue.

      The data files are coming from my software which is all written in .Net and when outputting in UTF-8 the default behaviour of .Net is to include the BOM at the start of the file. The BOM is not required for UTF-8 but it is not forbidden so I think this should be fixed (if possible) for future releases. I will be modifying my software so that output of the BOM can be disabled by my users if desired

      Looking at the error message given I expect that the same problem would also affect N3 files since they are using the same reader afaict from the error trace.

      Attachments

        1. ttl-with-bom.ttl
          0.1 kB
          Rob Vesse
        There are no Sub-Tasks for this issue.

        Activity

          People

            andy Andy Seaborne
            rvesse Rob Vesse
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: