Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2722

Add new dfdl:lengthKind 'dfdlx:patternMatch'

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.3.0
    • None
    • Back End, Diagnostics, Front End
    • None

    Description

      I've run into the problem with lengthKind 'pattern' where no-match just silently returns 0 length many times now. 

      I've finally run out of patience with it. 

      Consider the idiom used in mil-std-2045 and other related standards for variable length strings with a max length. These use a convention where if the max length is used, no terminator character follows. But if less than the max are used, a DEL character is used as the terminator.

      So, consider a zero-length string. This appears in the data stream as just a DEL character.

      The standard idiom for a length 20 string would be this:

        

      <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="[^\x7F]{0,19}(?=\x7F)|.{20}">
              <xs:simpleType>
                <xs:restriction base="xs:string">
                  <xs:maxLength value="20"/>
                </xs:restriction>
              <xs:simpleType>
            </xs:element>
            <xs:sequence dfdl:terminator="{if (fn:string-length(./value) eq 20) then '%ES;' else '%DEL;'}"/>

       

      Now consider if this is encountered near end of file, and there is no DEL found, neither are there 20 characters. The data is short.

      However, DFDL gives us no way to tell the difference between this and the situation where the data stream did in fact contain just a DEL to terminate a zero-length string.

      In both cases we get a successful parse of the element named 'value'. 

      However, in the short data case, the terminator will then not be found and a parse error will be issued indicating terminator not found.

      This is ok, but really we would get a better diagnostic if the element did not even pattern match successfully because we found no DEL nor 20 characters. 

      When you look at the alternatives to improve this, one thing comes to mind:

      We add another assert at the start of the group, which uses a dfdl:assert with testKind pattern to detect if enough data is present to parse the field. 

      This works, but it is going through matching the regex TWICE. The first regex match is purely so we can tell apart the no-match case from the zero-length match case. 

      It works, but feels very heroic, as in way too complex. 

      <xs:sequence>
              <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
                <dfdl:assert testKind='pattern'
                   message="String not found. Neither DEL terminator, nor 20 characters could be parsed."
                   testPattern="[^\x7F]{0,19}(?=\x7F)|.{20}"/>
              </xs:appinfo></xs:annotation>
            </xs:sequence>
            <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="[^\x7F]{0,19}(?=\x7F)|.{20}">
              <xs:simpleType>
                <xs:restriction base="xs:string">
                  <xs:maxLength value="20"/>
                </xs:restriction>
              <xs:simpleType>
            </xs:element>
            <xs:sequence dfdl:terminator="{if (fn:string-length(./value) eq 20) then '%ES;' else '%DEL;'}"/>

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbeckerle Mike Beckerle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: