[SPARK-18352] Parse normal, multi-line JSON files (not just JSON Lines) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
- releasenotes

Target Version/s:

2.2.0

Description

Spark currently can only parse JSON files that are JSON lines, i.e. each record has an entire line and records are separated by new line. In reality, a lot of users want to use Spark to parse actual JSON files, and are surprised to learn that it doesn't do that.

We can introduce a new mode (wholeJsonFile?) in which we don't split the files, and rather stream through them to parse the JSON files.

Attachments

Issue Links

is duplicated by

SPARK-10840 SparkSQL doesn't work well with JSON

Closed

SPARK-17969 I think it's user unfriendly to process standard json file with DataFrame

Closed

SPARK-7366 Support multi-line JSON objects

Closed

is related to

SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV

Resolved

relates to

SPARK-16496 Add wholetext as option for reading text in SQL.

Resolved

SPARK-7366 Support multi-line JSON objects

Closed

links to

[Github] Pull Request #16386 (NathanHowell)

[Github] Pull Request #17128 (felixcheung)

(1 relates to, 2 links to)

Sub-Tasks

Writing to a text DataSource buffers one or more lines in memory

Resolved

Nathan Howell

Activity

People

Assignee:: Nathan Howell

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 08/Nov/16 07:08

Updated:: 12/Dec/22 18:11

Resolved:: 17/Feb/17 04:51