[SPARK-5443] jsonRDD with schema should ignore sub-objects that are omitted in schema - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.4.0
Component/s: SQL
Labels:
None

Description

Reading the code for jsonRDD, it appears that all fields of a JSON object are read into a ROW independent of the provided schema. I would expect it to be more efficient to only store in the ROW those fields that are explicitly included in the schema.

For example, assume that I only wish to extract the "id" field of a tweet. If I provided a schema that simply had one field within a map named "id", then the row object would only store that field within a map.

Attachments

Issue Links

links to

[Github] Pull Request #5801 (NathanHowell)

Activity

People

Assignee:: Nathan Howell

Reporter:: Derrick Burns

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 28/Jan/15 04:08

Updated:: 07/May/15 06:01

Resolved:: 07/May/15 06:01

Time Tracking

Estimated:

168h

Remaining:

168h

Logged:

Not Specified