Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Done
-
None
-
None
-
None
Description
For the jsonRDD like this:
""" {a: 1} """ """ {b: 2} """ """ {c: 3} """ """ {d: 4} """ """ {e: 5} """
It will create a StructType with 5 fileds in it, each field come from a different row. It will be a problem if the RDD is large. A StructType with thousands or millions fields is hard to play with (will cause stack overflow during serialization).
It should be MapType for this case. We need a clear rule to decide StructType or MapType will be used for dict in json data.
Attachments
Issue Links
- is duplicated by
-
SPARK-5936 Automatically convert a StructType to a MapType when the number of fields exceed a threshold.
- Resolved
- relates to
-
SPARK-4190 Allow users to provide transformation rules at JSON ingest
- Resolved
- links to