Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23192

Hint is lost after using cached data

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.2.1, 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The hint of the plan segment is lost, if the plan segment is replaced by the cached data.

            val df1 = spark.createDataFrame(Seq((1, "4"), (2, "2"))).toDF("key", "value")
            val df2 = spark.createDataFrame(Seq((1, "1"), (2, "2"))).toDF("key", "value")
            df2.cache()
            val df3 = df1.join(broadcast(df2), Seq("key"), "inner")
      

      Hint is lost in df3. The physical join algorithm will not respect the hint due to the loss.

        Attachments

          Activity

            People

            • Assignee:
              smilegator Xiao Li
              Reporter:
              smilegator Xiao Li
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: