Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23192

Hint is lost after using cached data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.2.1, 2.3.0
    • 2.3.0
    • SQL
    • None

    Description

      The hint of the plan segment is lost, if the plan segment is replaced by the cached data.

            val df1 = spark.createDataFrame(Seq((1, "4"), (2, "2"))).toDF("key", "value")
            val df2 = spark.createDataFrame(Seq((1, "1"), (2, "2"))).toDF("key", "value")
            df2.cache()
            val df3 = df1.join(broadcast(df2), Seq("key"), "inner")
      

      Hint is lost in df3. The physical join algorithm will not respect the hint due to the loss.

      Attachments

        Activity

          People

            smilegator Xiao Li
            smilegator Xiao Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: