Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Resolved
-
2.4.3
-
None
-
I have tried this on AWS Glue with Spark 2.4.3
and on windows 10 with 2.4.4
at both of them facing same issue
Description
I am writing a program to analyze sql query. So I am using Spark logical plan.I am writing a program to analyze sql query. So I am using Spark logical plan.
Below is the code which I am using
object QueryAnalyzer { val LOG = LoggerFactory.getLogger(this.getClass) //Spark Conf val conf = new SparkConf().setMaster("local[2]").setAppName("LocalEdlExecutor") //Spark Context val sc = new SparkContext(conf) //sql Context val sqlContext = new SQLContext(sc) //Spark Session val sparkSession = SparkSession .builder() .appName("Spark User Data") .config("spark.app.name", "LocalEdl") .getOrCreate() def main(args: Array[String]) { var inputDfColumns = Map[String,List[String]]() val dfSession = sparkSession.read.format("csv"). option("header", "true"). option("inferschema", "true"). option("delimiter", ",").option("decoding", "utf8").option("multiline", true) var oDF = dfSession. load("C:\\Users\\tarun.khaneja\\data\\order.csv") println("smaple data in oDF====>") oDF.show() var cusDF = dfSession. load("C:\\Users\\tarun.khaneja\\data\\customer.csv") println("smaple data in cusDF====>") cusDF.show() oDF.createOrReplaceTempView("orderTempView") cusDF.createOrReplaceTempView("customerTempView") //get input columns from all dataframe inputDfColumns += ("orderTempView"->oDF.columns.toList) inputDfColumns += ("customerTempView"->cusDF.columns.toList) val res = sqlContext.sql("""select OID, max(MID+CID) as MID_new,ROW_NUMBER() OVER ( ORDER BY CID) as rn from (select OID_1 as OID, CID_1 as CID, OID_1+CID_1 as MID from (select min(ot.OrderID) as OID_1, ct.CustomerID as CID_1 from orderTempView as ot inner join customerTempView as ct on ot.CustomerID = ct.CustomerID group by CID_1)) group by OID,CID""") println(res.show(false)) val analyzedPlan = res.queryExecution.analyzed println(analyzedPlan.prettyJson) }
Now problem is, with Spark 2.2.1, I am getting below json. where I have SubqueryAlias which provide important information of alias name for table which we used in query, as shown below.
But with Spark 2.4, I am getting SubqueryAlias name as null. As shown below in json screenshot
So, I am not sure if it is bug in Spark 2.4 because of which I am getting name as null in SubquerAlias.
Or if it is not bug then how can I get relation between alias name and real table name.
Any idea on this?