Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-1979

UTF8 decoder doesn't handle 3-byte and 4-byte correctly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.2.0
    • Component/s: Back End
    • Labels:
      None

      Description

      It is classifying some valid characters as "overlong" and erroring out.

      The PNG schema on DFDLSchemas github has 1 test that runs into this bug on 3 byte Devangari script characters.

      This is 6 devangari characters: e0 a4 b6 e0 a5 80 e0 a4 b0 e0 a5 8d e0 a4 b7 e0 a4 95
      Should be: शीर्षक

      But is coming out all substitution chars.

      In 3 byte utf-8, the bits that at least one of must be non-zero are shown here in M, notice one of them is in the second byte. This second byte wasn't being tested.

      1110MMMM 10Mxxxxx 10xxxxxx

      In 4 byte utf-8, the bits that must at least one of be non-zero are:

      11110 MMM 10MMxxxx 10xxxxxx 10xxxxxx

        Attachments

          Activity

            People

            • Assignee:
              dfthompson Dave Thompson
              Reporter:
              mbeckerle Michael Beckerle
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: