Description
It is a common pattern to apply multiple transforms to a Dataset (using Dataset.withColumn for example. This is currently quite expensive because we run CheckAnalysis on the full plan and create an encoder for each intermediate Dataset.
CheckAnalysis only needs to be run for the newly added plan components, and not for the full plan. The addition of the AnalysisBarrier created this issue.
Attachments
Issue Links
- relates to
-
SPARK-17006 WithColumn Performance Degrades with Number of Invocations
- Resolved
- links to