Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2684

daffodil-cli splitParse mode

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • CLI

    Description

      A common way Daffodil is used involves first splitting data off of a TCP stream or other input stream, and then handing each split (a byte array) to Daffodil to parse a single message. 

      This differs from the current CLI "streaming" mode in the way errors work. The existing streaming can't tolerate errors. Any error halts parsing the entire stream. The only way to parse an entire stream that includes a mixture of correct and malformed data is to use a DFDL schema which actually accepts even malformed data, creating elements from it. (E.g., <invalid>8929AFB3892</invalid> ) 

      But this is unnatural and adds complexity to the DFDL schema that wouldn't otherwise be needed. 

      The split-and-parse method can continue to parse the next message even after a failure to parse. The only thing that is fatal to the whole processing run is if it is not possible to meaningfully split the message from the data stream. 

      So we want a split-and-parse capability in the CLI. Such mode uses two DFDL schemas, a splitter schema (very simple), and a regular parse schema. The splitter schema just does the minimum to split a message from the stream, then parses the byte-array it gets from the split, and parses that. 

      There is no real unparser symmetric equivalent of this split-and-parse behavior. Regular streaming unparsing works. 

      The prototype of this idea is on github openDFDL examples repo splitAndParse subdir/project. This is 100% code authored by mbeckerle (Daffodil PMC) intended to contribute to Daffodil, so no issue pulling it, or parts of it into Daffodil. 

      Suggest command line like this:

      daffodil parse --stream --splitterSchema filename ... other options as per parse. 

      When --stream is specified, the --splitterSchema option is available. If used it provides the file name of a splitter DFDL schema. 

      If the splitter DFDL schema is precompiled then the options would be

      daffodil parse --stream --splitterParser binaryfilename ... other options as per parse. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbeckerle Mike Beckerle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: