Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45891

Support Variant data type

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      I propose to add a Variant data type in Spark. It is used to efficiently represent semi-structured values without a user-specified schema. Currently, many users are depending on JSON expressions to handle JSON data, which can often lead to repeated JSON parsing and degraded performance. One of the major goals of the Variant type is to use a more efficient binary representation internally and avoid repeated JSON parsing. At the same time, it keeps the flexibility of schemaless JSON data.

      Attachments

        1.
        Add Variant data type in Spark Sub-task Resolved Chenhao Li
        2.
        Implement parse_json Sub-task Resolved Chenhao Li
        3.
        Support to_json(variant) Sub-task Resolved Chenhao Li
        4.
        Improve parquet schema checks Sub-task Resolved David Cashman
        5.
        Add variant_get expression. Sub-task Resolved Chenhao Li
        6.
        Disallow comparing variant. Sub-task Resolved Chenhao Li
        7.
        Add variant_explode expression. Sub-task Resolved Chenhao Li
        8.
        Add schema_of_variant expression. Sub-task Resolved Chenhao Li
        9.
        Support cast from variant. Sub-task Resolved Chenhao Li
        10.
        Add schema_of_variant_agg expression. Sub-task Resolved Chenhao Li
        11.
        Support remaining scalar types in the variant spec. Sub-task Resolved Chenhao Li
        12.
        Support cast to variant. Sub-task Resolved Chenhao Li
        13.
        Add is_variant_null expression Sub-task Resolved Richard Chen
        14.
        Prohibit Hash expressions from hashing Variant type Sub-task Resolved Harsh Motwani
        15.
        Add VariantVal for PySpark Sub-task Resolved Gene Pang
        16.
        Add support for Variant schema in from_json Sub-task Resolved Harsh Motwani
        17.
        Support Variant in JSON scan. Sub-task Resolved Chenhao Li
        18.
        Add python and scala dataframe variant expression aliases. Sub-task Resolved Chenhao Li
        19.
        Add remaining scalar types to the Python variant library Sub-task Resolved Harsh Motwani
        20.
        Implement try_parse_json Sub-task Resolved Harsh Motwani
        21.
        Support Generated Column expressions that are `RuntimeReplaceable` Sub-task Resolved Richard Chen
        22.
        Add Golden Table Tests for Variant from different engines Sub-task Open Unassigned
        23.
        Fix Variant default columns for more complex default variants Sub-task Resolved Richard Chen
        24.
        Disable variant from being a part of a map key Sub-task Resolved Harsh Motwani
        25.
        Document planned approach to shredding Sub-task Resolved David Cashman
        26.
        Avoid storage amplification when accessing sub-Variant Sub-task Resolved David Cashman
        27.
        Support variant in `InMemoryTableScan` Sub-task Resolved Richard Chen
        28.
        Disable variant input/output from scalar UDFs Sub-task Resolved Richard Chen
        29.
        Functions to shred a Variant into components Sub-task Open Unassigned
        30.
        Add support for interval types in the Variant spec Sub-task Open Unassigned
        31.
        Fix cached Variant with column size greater than 128KB or individual variant larger than 2kb Sub-task Resolved Richard Chen
        32.
        Mark variant as hive incompatible data type Sub-task Resolved Kent Yao
        33.
        Implement to_variant_object expression and make schema_of_variant expressions print OBJECT for for Variant Objects Sub-task Resolved Harsh Motwani
        34.
        Remove string and binary from metadata in spec Sub-task Resolved David Cashman
        35.
        Allow duplicate keys in parse_json. Sub-task Resolved Chenhao Li
        36.
        Distinguish logical and physical types in variant spec Sub-task Resolved David Cashman
        37.
        Add variant metrics to JSON Scan nodes and Project nodes containing variant-constructor expressions Sub-task Open Unassigned

        Activity

          People

            mashplant Chenhao Li
            mashplant Chenhao Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: