Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4190

Allow users to provide transformation rules at JSON ingest

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.1.0, 1.2.0
    • None
    • SQL
    • None

    Description

      It would be great if it were possible to provide transformation rules (to be executed within jsonRDD or jsonFile) so that users could

      (1) deal with JSON files that confound schema inference or are otherwise insufficiently disciplined, or
      (2) simply perform arbitrary object transformations at ingest before a schema is inferred.

      json4s, which Spark already uses, has nice interfaces for specifying transformations as partial functions on objects and accessing nested structures via path expressions. (We might want to introduce an abstraction atop json4s for a public API, but the json4s API seems like a good first step.) There are some examples of these transformations at https://github.com/json4s/json4s and at http://chapeau.freevariable.com/2014/10/fedmsg-and-spark.html

      Attachments

        Issue Links

          Activity

            People

              willbenton William Benton
              willbenton William Benton
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: