Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
Description
Currently the Json Parser is forced to read json files in UTF-8. Such behavior breaks backward compatibility with Spark 2.2.1 and previous versions that can read json files in UTF-16, UTF-32 and other encodings due to using of the auto detection mechanism of the jackson library. Need to give back to users possibility to read json files in specified charset and/or detect charset automatically as it was before.
Attachments
Attachments
1.
|
New encoding option for json datasource | Resolved | Max Gekk | |
2.
|
Custom record separator for jsons in charsets different from UTF-8 | Resolved | Max Gekk | |
3.
|
Improve Hadoop's LineReader to support charsets different from UTF-8 | Resolved | Unassigned | |
4.
|
Support lineSep format independent from encoding | Resolved | Unassigned | |
5.
|
Tests for Hadoop's LinesReader | Resolved | Max Gekk |