Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30324

Simplify API for JSON access in DataFrames/SQL

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.4
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      get_json_object() is a UDF to parse JSON fields. It is verbose and hard to use, e.g. I wasn't expecting the path to a field to have to start with "$.".

      We can simplify all of this when a column is of StringType, and a nested field is requested. This API sugar will in the query planner be rewritten asĀ get_json_object.

      This nested access can then be extended in the future to other semi-structured formats.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                brkyvz Burak Yavuz
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: