Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
Postgres features a number of JSON functions that are missing in Spark: https://www.postgresql.org/docs/9.3/functions-json.html
Redshift's JSON functions (https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html) have partial overlap with the Postgres list.
Some of these functions can be expressed in terms of compositions of existing Spark functions. For example, I think that json_array_length can be expressed with cardinality and from_json, but there's a caveat related to legacy Hive compatibility (see the demo notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5796212617691211/45530874214710/4901752417050771/latest.html for more details).
I'm filing this ticket so that we can triage the list of Postgres JSON features and decide which ones make sense to support in Spark. After we've done that, we can create individual tickets for specific functions and features.