Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23223

Stacking dataset transforms performs poorly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      It is a common pattern to apply multiple transforms to a Dataset (using Dataset.withColumn for example. This is currently quite expensive because we run CheckAnalysis on the full plan and create an encoder for each intermediate Dataset.

      CheckAnalysis only needs to be run for the newly added plan components, and not for the full plan. The addition of the AnalysisBarrier created this issue.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hvanhovell Herman van Hovell
                Reporter:
                hvanhovell Herman van Hovell
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: