Description
This is a bug introduced by subquery handling. generateTreeString numbers trees including innerChildren (used to print subqueries), but getNodeNumbered ignores that. As a result, getNodeNumbered is not always correct.
Repro:
val df = sql("select * from range(10) where id not in " + "(select id from range(2) union all select id from range(2))") println("-------------------------------------------------------") println(df.queryExecution.analyzed.numberedTreeString) println("-------------------------------------------------------") println("-------------------------------------------------------") println(df.queryExecution.analyzed(3)) println("-------------------------------------------------------")
Output looks like
------------------------------------------------------- 00 Project [id#1L] 01 +- Filter NOT predicate-subquery#0 [(id#1L = id#2L)] 02 : +- Union 03 : :- Project [id#2L] 04 : : +- Range (0, 2, step=1, splits=None) 05 : +- Project [id#3L] 06 : +- Range (0, 2, step=1, splits=None) 07 +- Range (0, 10, step=1, splits=None) ------------------------------------------------------- ------------------------------------------------------- null -------------------------------------------------------
Note that 3 should be the Project node, but getNodeNumbered ignores innerChild and as a result returns the wrong one.
Attachments
Issue Links
- links to