[SPARK-29186] SubqueryAlias name value is null in Spark 2.4.3 Logical plan. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Resolved
Affects Version/s: 2.4.3
Fix Version/s: 2.4.5, 3.0.0
Component/s: SQL
Labels:
None
Environment:

I have tried this on AWS Glue with Spark 2.4.3

and on windows 10 with 2.4.4

at both of them facing same issue

Flags:

Patch, Important
External issue URL:
https://stackoverflow.com/questions/58016993/subqueryalias-is-null-in-spark-2-4-logical-plan-is-it-bug

Description

I am writing a program to analyze sql query. So I am using Spark logical plan.I am writing a program to analyze sql query. So I am using Spark logical plan.

Below is the code which I am using

object QueryAnalyzer
{   
val LOG = LoggerFactory.getLogger(this.getClass)     //Spark Conf 
   
val conf = new     SparkConf().setMaster("local[2]").setAppName("LocalEdlExecutor")     
//Spark Context    
val sc = new SparkContext(conf)

//sql Context    
val sqlContext = new SQLContext(sc)   
  
//Spark Session    
val sparkSession = SparkSession      
.builder()      
.appName("Spark User Data")      .config("spark.app.name", "LocalEdl")      .getOrCreate()     

def main(args: Array[String])
{          
var inputDfColumns = Map[String,List[String]]() 
val dfSession =  sparkSession.read.format("csv").      option("header", "true").      option("inferschema", "true").      option("delimiter", ",").option("decoding", "utf8").option("multiline", true) 
       
var oDF = dfSession.      load("C:\\Users\\tarun.khaneja\\data\\order.csv")        
println("smaple data in oDF====>")
      
oDF.show()           

var cusDF = dfSession.        load("C:\\Users\\tarun.khaneja\\data\\customer.csv")          
println("smaple data in cusDF====>")      cusDF.show() 
             oDF.createOrReplaceTempView("orderTempView")      cusDF.createOrReplaceTempView("customerTempView")
            
//get input columns from all dataframe      

inputDfColumns += 
("orderTempView"->oDF.columns.toList) 
     
inputDfColumns += 
("customerTempView"->cusDF.columns.toList) 
           
val res = sqlContext.sql("""select OID, max(MID+CID) as MID_new,ROW_NUMBER() OVER (                      
ORDER BY CID) as rn from                             (select OID_1 as OID, CID_1 as CID, OID_1+CID_1 as MID from (select min(ot.OrderID) as OID_1, ct.CustomerID as CID_1 from orderTempView as ot inner join customerTempView as ct                          on ot.CustomerID = ct.CustomerID group by CID_1)) group by OID,CID""")

println(res.show(false))                             

val analyzedPlan = res.queryExecution.analyzed      println(analyzedPlan.prettyJson)

}

Now problem is, with Spark 2.2.1, I am getting below json. where I have SubqueryAlias which provide important information of alias name for table which we used in query, as shown below.

But with Spark 2.4, I am getting SubqueryAlias name as null. As shown below in json screenshot

So, I am not sure if it is bug in Spark 2.4 because of which I am getting name as null in SubquerAlias.

Or if it is not bug then how can I get relation between alias name and real table name.
Any idea on this?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2019-09-25-12-17-53-552.png
25/Sep/19 06:47
81 kB
Tarun Khaneja
image-2019-09-25-12-21-52-136.png
25/Sep/19 06:51
70 kB
Tarun Khaneja

Issue Links

links to

GitHub Pull Request #25959

GitHub Pull Request #25970

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Tarun Khaneja

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Sep/19 06:51

Updated:: 12/Dec/22 18:11

Resolved:: 30/Sep/19 03:04