[SPARK-23410] Unable to read jsons in charset different from UTF-8 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

Currently the Json Parser is forced to read json files in UTF-8. Such behavior breaks backward compatibility with Spark 2.2.1 and previous versions that can read json files in UTF-16, UTF-32 and other encodings due to using of the auto detection mechanism of the jackson library. Need to give back to users possibility to read json files in specified charset and/or detect charset automatically as it was before.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

utf16WithBOM.json
14/Feb/18 22:30
0.2 kB
Max Gekk

Sub-Tasks

1.	New encoding option for json datasource	Resolved	Max Gekk
2.	Custom record separator for jsons in charsets different from UTF-8	Resolved	Max Gekk
3.	Improve Hadoop's LineReader to support charsets different from UTF-8	Resolved	Unassigned
4.	Support lineSep format independent from encoding	Resolved	Unassigned
5.	Tests for Hadoop's LinesReader	Resolved	Max Gekk

Activity

People

Assignee:: Unassigned

Reporter:: Max Gekk

Shepherd:: hyukjin.kwon

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 13/Feb/18 13:40

Updated:: 12/Dec/22 18:11

Resolved:: 25/May/21 01:45