Description
DataFrame.toJSON uses DataFrame.mapPartitions, which converts internal rows to external rows. We can use queryExecution.toRdd.mapPartitions instead for better performance.
Another issue is that, for UDT values, serialize produces internal types. So currently we must deal with both internal and external types within toJSON (see here), which is pretty weird.
Attachments
Issue Links
- links to