Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5443

jsonRDD with schema should ignore sub-objects that are omitted in schema

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.4.0
    • SQL
    • None

    Description

      Reading the code for jsonRDD, it appears that all fields of a JSON object are read into a ROW independent of the provided schema. I would expect it to be more efficient to only store in the ROW those fields that are explicitly included in the schema.

      For example, assume that I only wish to extract the "id" field of a tweet. If I provided a schema that simply had one field within a map named "id", then the row object would only store that field within a map.

      Attachments

        Activity

          People

            NathanHowell Nathan Howell
            derrickburns Derrick Burns
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified