Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2063

A 4 byte UTF-8 character incorrectly failing maxlenght facet.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.1.4
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      Windows (Affects all OS)

      Description

      A 4 byte UTF-8 character incorrectly failing maxlenght facet.
      The data is F0 9D 90 80 and is a 4-byte UTF-8 sequence to represent 1 character.
      It is failing with
      Error at file input.xml, line 4, char 17
      Message: value '??' has length '2' which exceeds maxLength facet value '1'
      when running sax2count.exe

      This looks like a limitation but I could not find any documentation about it in the bug list.

      *Input XML*

      <?xml version="1.1" encoding="UTF-8"?>
      <Root xmlns="http://www.example.org/Test" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org/Test
      Input.xsd">
      <Data>𝐀</Data>
      </Root>

      *Schema*

      <?xml version="1.0" encoding="UTF-8"?>
      <schema targetNamespace="http://www.example.org/Test" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.example.org/Test">
      <element name="Root">
      <complexType>
      <sequence>
      <element name="Data">
      <simpleType>
      <restriction base="string">
      <maxLength value="1"/>
      </restriction>
      </simpleType>
      </element>
      </sequence>
      </complexType>
      </element>
      </schema>

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              giwinski Greg Iwinski
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: