Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4476

Use MapType for dict in json which has unique keys in each row.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Done
    • None
    • None
    • SQL
    • None

    Description

      For the jsonRDD like this:

      """ {a: 1} """
      """ {b: 2} """
      """ {c: 3} """
      """ {d: 4} """
      """ {e: 5} """
      

      It will create a StructType with 5 fileds in it, each field come from a different row. It will be a problem if the RDD is large. A StructType with thousands or millions fields is hard to play with (will cause stack overflow during serialization).

      It should be MapType for this case. We need a clear rule to decide StructType or MapType will be used for dict in json data.

      cc yhuai marmbrus

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              davies Davies Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: