Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29186

SubqueryAlias name value is null in Spark 2.4.3 Logical plan.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Resolved
    • 2.4.3
    • 2.4.5, 3.0.0
    • SQL
    • None
    • I have tried this on AWS Glue with Spark 2.4.3

      and on windows 10 with 2.4.4

      at both of them facing same issue

    Description

      I am writing a program to analyze sql query. So I am using Spark logical plan.I am writing a program to analyze sql query. So I am using Spark logical plan.

      Below is the code which I am using
         

      object QueryAnalyzer
      {   
      val LOG = LoggerFactory.getLogger(this.getClass)     //Spark Conf 
         
      val conf = new     SparkConf().setMaster("local[2]").setAppName("LocalEdlExecutor")     
      //Spark Context    
      val sc = new SparkContext(conf)
      
      //sql Context    
      val sqlContext = new SQLContext(sc)   
        
      //Spark Session    
      val sparkSession = SparkSession      
      .builder()      
      .appName("Spark User Data")      .config("spark.app.name", "LocalEdl")      .getOrCreate()     
      
      def main(args: Array[String])
      {          
      var inputDfColumns = Map[String,List[String]]() 
      val dfSession =  sparkSession.read.format("csv").      option("header", "true").      option("inferschema", "true").      option("delimiter", ",").option("decoding", "utf8").option("multiline", true) 
             
      var oDF = dfSession.      load("C:\\Users\\tarun.khaneja\\data\\order.csv")        
      println("smaple data in oDF====>")
            
      oDF.show()           
      
      var cusDF = dfSession.        load("C:\\Users\\tarun.khaneja\\data\\customer.csv")          
      println("smaple data in cusDF====>")      cusDF.show() 
                   oDF.createOrReplaceTempView("orderTempView")      cusDF.createOrReplaceTempView("customerTempView")
                  
      //get input columns from all dataframe      
      
      inputDfColumns += 
      ("orderTempView"->oDF.columns.toList) 
           
      inputDfColumns += 
      ("customerTempView"->cusDF.columns.toList) 
                 
      val res = sqlContext.sql("""select OID, max(MID+CID) as MID_new,ROW_NUMBER() OVER (                      
      ORDER BY CID) as rn from                             (select OID_1 as OID, CID_1 as CID, OID_1+CID_1 as MID from (select min(ot.OrderID) as OID_1, ct.CustomerID as CID_1 from orderTempView as ot inner join customerTempView as ct                          on ot.CustomerID = ct.CustomerID group by CID_1)) group by OID,CID""")
      
      println(res.show(false))                             
      
      val analyzedPlan = res.queryExecution.analyzed      println(analyzedPlan.prettyJson)
      
      }
      

       
      Now problem is, with Spark 2.2.1, I am getting below json. where I have SubqueryAlias which provide important information of alias name for table which we used in query, as shown below.

       

         

      But with Spark 2.4, I am getting SubqueryAlias name as null. As shown below in json screenshot

       

       

      So, I am not sure if it is bug in Spark 2.4 because of which I am getting name as null in SubquerAlias.

      Or if it is not bug then how can I get relation between alias name and real table name.
      Any idea on this?

      Attachments

        1. image-2019-09-25-12-17-53-552.png
          81 kB
          Tarun Khaneja
        2. image-2019-09-25-12-21-52-136.png
          70 kB
          Tarun Khaneja

        Activity

          People

            viirya L. C. Hsieh
            tarun.khaneja Tarun Khaneja
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: