Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14948

Exception when joining DataFrames derived form the same DataFrame

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 1.6.0
    • None
    • SQL
    • None

    Description

      Spark Analyser is throwing the following exception in a specific scenario :

      Exception :

      org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from asd#5,F2#4,F1#6,F2#7 in operator !Project asd#5,F1#3;
      at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)

      Code :

      SparkClient.java
          StructField[] fields = new StructField[2];
          fields[0] = new StructField("F1", DataTypes.StringType, true, Metadata.empty());
          fields[1] = new StructField("F2", DataTypes.StringType, true, Metadata.empty());
          JavaRDD<Row> rdd =
              sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a", "b")));
          DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new StructType(fields));
          sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
      
          DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as asd, F2 from t1");
      
          sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2");
          sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
          
          DataFrame join = aliasedDf.join(df, aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
          DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
          select.collect();
      
      

      Observations :

      • This issue is related to the Data Type of Fields of the initial Data Frame.(If the Data Type is not String, it will work.)
      • It works fine if the data frame is registered as a temporary table and an sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Saurabh Santhosh Saurabh Santhosh
              Michael Armbrust Michael Armbrust
              Votes:
              12 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

                Created:
                Updated: