Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
Description
I propose to add a Variant data type in Spark. It is used to efficiently represent semi-structured values without a user-specified schema. Currently, many users are depending on JSON expressions to handle JSON data, which can often lead to repeated JSON parsing and degraded performance. One of the major goals of the Variant type is to use a more efficient binary representation internally and avoid repeated JSON parsing. At the same time, it keeps the flexibility of schemaless JSON data.