Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45891 Support Variant data type
  3. SPARK-48587

Avoid storage amplification when accessing sub-Variant

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      When a variant_get expression returns a Variant, or a nested type containing Variant, we just return the sub-slice of the Variant value along with the full metadata, even though most of the metadata is probably unnecessary to represent the value. This may be very inefficient if the value is then written to disk (e.g. shuffle file or parquet). We should instead rebuild the value with minimal metadata.

      Attachments

        Issue Links

          Activity

            People

              David Cashman David Cashman
              David Cashman David Cashman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: