Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10705

Stop converting internal rows to external rows in DataFrame.toJSON

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.1, 1.4.1, 1.5.0
    • 1.6.0
    • SQL
    • None

    Description

      DataFrame.toJSON uses DataFrame.mapPartitions, which converts internal rows to external rows. We can use queryExecution.toRdd.mapPartitions instead for better performance.

      Another issue is that, for UDT values, serialize produces internal types. So currently we must deal with both internal and external types within toJSON (see here), which is pretty weird.

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: