Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2708

XML String feature in XML Text Infoset Inputter/Outputter

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.3.0
    • 3.4.0
    • Back End
    • None

    Description

      Several users need a specific feature.

      The required feature is needed for XML output where a string that is known to itself be a string of XML text can be embedded in the XML output from parsing without escaping it.

      Symmetrically, for unparsing, a string element identified as XML text should result in a series of XML "events" being absorbed and converted to a string which is the ultimate value of the string element. 

      Note that for any given popular data format (XML, JSON, etc.) where Daffodil supports output of infosets in that representation, the same issue can arise where data contains a string which is already in that representation and users desire for it to be directly embedded, not escaped as a string. 

      For the  purposes of this ticket, let's focus on XML only. Other representations could be added subsequently. 

      Notes:

      1) on canonicalization - I see know way to avoid strong canonicalization of this XML. If byte for byte preservation of characters such as character entities like   (a space) or CRLFs is needed, there's just no way to do that(at least that I know of). 

      2) XML initial slug line/processing instruction - a way to strip this if present in the XML string may be needed. An option to generate it as part of the string when unparsing may also be needed. 

      3) An ASCII-only or iso-8859-1 only option may be needed where any character outside of those and standard whitespaces is converted to a character entity. 

      4) This breaks the idea that the DFDL schema IS the XML Schema of the output Infoset XML from parsing. Rather, to create an XML schema for the resulting data, one would have to replace the DFDL element declaration for the string to an appropriate DFDL element reference to the schema of the XML being embedded at that place. 

      It is highly recommended that such a DFDL schema contain comments describing this exact element reference - namespace + name, that the XML String corresponds to. 

      w.r.t. implementation...

      There's some pseudocode for in the "Example Implementation" section of
      the Runtime Properties proposal:

      https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties#Proposal%3ARuntimeProperties-ExampleImplementation

      This pseudocode uses the ScalaXML InfosetInputter/Outputter as a base for simplicity, but we should base the actual one on the XMLTextInfosetInputter/Outputter
      since that's what most people use.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbeckerle Mike Beckerle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: