[SPARK-16548] java.io.CharConversionException: Invalid UTF-32 character prevents me from querying my data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.6.1
Fix Version/s: 2.2.0, 2.3.0
Component/s: SQL
Labels:
None

Description

Basically, when I query my json data I get

java.io.CharConversionException: Invalid UTF-32 character 0x7b2265(above 10ffff)  at char #192, byte #771)
	at com.fasterxml.jackson.core.io.UTF32Reader.reportInvalid(UTF32Reader.java:189)
	at com.fasterxml.jackson.core.io.UTF32Reader.read(UTF32Reader.java:150)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.loadMore(ReaderBasedJsonParser.java:153)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:1855)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:571)
	at org.apache.spark.sql.catalyst.expressions.GetJsonObject$$anonfun$eval$2$$anonfun$4.apply(jsonExpressions.scala:142)

I do not like it. If you can not process one json among 100500 please return null, do not fail everything. I have dirty one line fix, and I understand how I can make it more reasonable. What is our position - what behaviour we wanna get?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

corrupted.json
08/Apr/19 22:43
3.24 MB
Bijith Kumar

Issue Links

is duplicated by

SPARK-20314 Inconsistent error handling in JSON parsing SQL functions

Resolved

is related to

SPARK-23094 Json Readers choose wrong encoding when bad records are present and fail

Resolved

links to

[Github] Pull Request #17693 (ewasserman)

Activity

People

Assignee:: Unassigned

Reporter:: Egor Pahomov

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Jul/16 17:05

Updated:: 24/Jul/19 19:03

Resolved:: 26/Apr/17 03:44