Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44265

Built-in XML data source support

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 4.0.0
    • None
    • SQL
    • Built-in XML data source support

    Description

      XML is a widely used data format. An external spark-xml package (https://github.com/databricks/spark-xml) is available to read and write XML data in spark. Making spark-xml built-in will provide a better user experience for Spark SQL and structured streaming. The proposal is to inline code from spark-xml package.

       

      Here is the link to SPIP

      Attachments

        1.
        Port the initial implementation of Spark XML data source Sub-task Resolved Hyukjin Kwon
        2.
        XML: Implement FIleFormat Interface Sub-task Resolved Sandip Agarwala
        3.
        XML: Update Spark Docs Sub-task Resolved tangjiafu
        4.
        XML: Add Python and sparkR binding including Spark Connect Sub-task Resolved Sandip Agarwala
        5.
        XML: Add SQL Expressions Sub-task Resolved Unassigned
        6.
        XML: Add pyspark.sql.functions Sub-task Resolved Sandip Agarwala
        7.
        XML: Spark connect support Sub-task Resolved Unassigned
        8.
        XML: to_xml Sub-task Resolved Sandip Agarwala
        9.
        XML: ArrayType and MapType support in from_xml Sub-task Resolved Unassigned
        10.
        XML: StructType schema issue in pyspark connect Sub-task Open Unassigned
        11.
        XML: XSD file URL support Sub-task Resolved Sandip Agarwala
        12.
        XML: Add XML Options using newOption Sub-task Resolved Sandip Agarwala
        13.
        XML: Add support for value in 'rowTag' element Sub-task Resolved Unassigned
        14.
        XML: Make 'rowTag' a required option Sub-task Resolved Sandip Agarwala
        15.
        XML: keepInnerXmlAsRaw option Sub-task Resolved Unassigned
        16.
        XML: Add DecimalType support in schema inference Sub-task Resolved Sandip Agarwala
        17.
        XML: Add TimestampNTZType support Sub-task Resolved Sandip Agarwala
        18.
        XML: Close InputStreamReader on read completion Sub-task Resolved Sandip Agarwala
        19.
        XML: Fix XSD big integer conversion Sub-task Resolved Sandip Agarwala
        20.
        XML: Use TypeCoercion.findTightestCommonType for compatibility check Sub-task Resolved Sandip Agarwala
        21.
        XML: Validate XML element name on write Sub-task Resolved Sandip Agarwala
        22.
        XML: Throw error on multiple XML data source Sub-task Resolved Sandip Agarwala
        23.
        XML: Limit size of corrupt record Sub-task Resolved Sandip Agarwala
        24.
        XML: Perf optimizations Sub-task Resolved Sandip Agarwala
        25.
        XML: Ignore commented Row Tags in XML tokenizer Sub-task Resolved Yousof Hosny
        26.
        XML: Ignore commented row tags in XML tokenizer Sub-task Resolved Unassigned
        27.
        XML: Change to not support DROPMALFORMED parse mode Sub-task Resolved Unassigned
        28.
        XML: Add XmlExpressionsSuite Sub-task Resolved Yousof Hosny
        29.
        XML: Add XmlFunctionsSuite Sub-task Resolved Yousof Hosny
        30.
        XML: Ignore row tags in CDATA Tokenizer Sub-task Resolved Yousof Hosny
        31.
        XML: Stop ignoring CDATA within rows. Sub-task Resolved Unassigned

        Activity

          People

            Unassigned Unassigned
            sandip.agarwala Sandip Agarwala
            Hyukjin Kwon Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: