Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23223

Stacking dataset transforms performs poorly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • SQL
    • None

    Description

      It is a common pattern to apply multiple transforms to a Dataset (using Dataset.withColumn for example. This is currently quite expensive because we run CheckAnalysis on the full plan and create an encoder for each intermediate Dataset.

      CheckAnalysis only needs to be run for the newly added plan components, and not for the full plan. The addition of the AnalysisBarrier created this issue.

      Attachments

        Issue Links

          Activity

            People

              hvanhovell Herman van Hövell
              hvanhovell Herman van Hövell
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: