Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Note that there is DFDL workgroup discussion about the implications of asking for length measured in units of 'characters' when the underlying item is not text, or not all text (complex types).
There is no issue when the character set encoding is fixed width. One simple takes the data size in bytes/bits and does the math to convert to characters.
The problem is when there is a variable-width encoding like UTF-8. Measuring length in characters in essence requires unparsing the data into those characters and counting how many, or perhaps unparsing the data to bits/bytes and then parsing it as characters and counting how many.
In either case, unless there is a uniform character encoding the behavior is confusing. Other places in DFDL where data that is not necessarily text may get interpreted as text are in lengthKind 'pattern', and in the pattern asserts and pattern discriminators used in parsing.