Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2504

Parse text of non-specified length from TCP hangs needlessly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.0.0
    • None
    • Back End
    • None

    Description

      See tests

      testDaffodilParseFromNetworkDelimited1

      testDaffodilParseFromNetworkDelimited1b

      testDaffodilParseFromNetworkDelimited2

      testDaffodilParseFromNetworkDelimited2b

      When parsing text from a network TCP stream, the parse should succeed once the parser knows it has matched the longest possible delimiter. It should not require more than that many characters to be present on the data stream in order for the parse to complete.

      There are no tests as yet, but presumably lengthKind 'pattern' will have a similar issue where only enough characters should be needed to provide the knowably longest match for the regex. (For example, suppose dfdl:lengthPattern="." which is looking for exactly 1 byte. The match of this should NOT require that more than one byte be available on the TCP stream.

      The arbitrary size 8 of the CharBuffer in InputSourceDataInputStream leads to this requiring around 8 characters of look ahead beyond the last character matched to the delimiter. Resizing this to 2 allows tests to succeed with fewer lookahead characters, but really the whole approach/algorithm needs to be examined to really consider the lookahead, and if it can be avoided in many cases.

      It is known that you can't always avoid looking ahead 1 character. For matching delimiters that use DFDL Character Class Entities that can match a variable number of characters (e.g., WSP+, WSP*, and NL) a lookahead of 1 is clearly necessary to know if the match is complete.

      For matching regular expressions, since they can lookahead an arbitrary finite distance, the amount of lookahead required depends on the specific regex.

      Since some amount of look ahead is required in these cases, fixing this issue for the simpler situation of just basic delimiters with a fixed number of characters seems relatively low priority.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbeckerle Mike Beckerle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: