Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-49

N3/NQ parsers ignoring stopAtFirstError flag

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7.0
    • 0.7.0
    • core
    • None
    • Any23 0.6.1 and repository

    Description

      The base interface for all RDF parsers (org.openrdf.rio.RDFParser) defines a method setStopAtFirstError. The documentation for this methods reads as "Sets whether the parser should stop immediately if it finds an error in the data". This is indeed very useful, as many data sets "out there" contain an amount of malformed entries.

      However, as far as I can tell from the current source code (0.6.1 and SVN trunk), the NQuadsParser (org.deri.any23.parser.NQuadsParser) ignores this flag. In its original implementation, it runs through the entire input in an unchecked loop as such:

      while(parseLine(fileReader)) {
      nextRow();
      }

      Now, if the parsing of any line in a potential huge file throws an exception, the entire parsing process stops regardless of the setting of the "stopAtFirstError" flag. I propose these loops to be changed to honor this flag, so that when it is set to "false", the rest of the line is discarded and the parsing process can continue with the next line.

      I have implemented this behavior on the latest version of NQuadsParser from SVN (r1601), the source file is attached. I have changed the parseLine() method as follows:

      private boolean parseLine(BufferedReader br) throws IOException,
      RDFParseException, RDFHandlerException {
      // [...]
      try

      { // [...] // notifiyStatement moved into try block notifyStatement(sub, pred, obj, graph); }

      catch (EOS eos)

      { reportFatalError("Unexpected end of line.", row, col); throw new IllegalStateException(); }

      catch (IllegalArgumentException iae) {
      if (!stopAtFirstError())

      { // remove remainder of broken line consumeBrokenLine(br); // notify parse error listener reportError(iae.getMessage(), row, col); }

      else

      { throw new RDFParseException(iae); }

      }
      // [...]
      }

      private void consumeBrokenLine(BufferedReader br) throws IOException {
      char c;
      while (true) {
      mark(br);
      c = readChar(br);
      if (c == '\n')

      { return; }

      }
      }

      It would be great if this or similar changes would find their way into the various Any23 RDF parsers.

      Attachments

        1. RobustNquadsParser.java
          17 kB
          Hannes Mühleisen

        Activity

          People

            michele.mostarda Michele Mostarda
            hfmuehleisen Hannes Mühleisen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: