Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
3.4.2, 3.5.0, 3.5.1, 3.5.2, 3.4.3
-
None
-
None
Description
In OpenLineage, via SparkEventListener a logical plan event is received and by parsing it the frameworks deduces Input/Output table names to create a lineage.
The issue is that in spark versions 3.4.2 and above (tested and reproducible in 3.4.2 & 3.5.0) the logical plan event sent by spark core is partial and is missing the tableName property which was been sent in earlier versions (working in spark 3.3.4).
Note: This issue is only encountered in drop table events.
For a drop table event, see below the logical plan in different spark versions
Spark 3.3.4
[ { "class": "org.apache.spark.sql.execution.command.DropTableCommand", "num-children": 0, "tableName": { "product-class": "org.apache.spark.sql.catalyst.TableIdentifier", "table": "drop_table_test", "database": "default" } , "ifExists": false, "isView": false, "purge": false } ]
Spark 3.4.2
[ { "class": "org.apache.spark.sql.catalyst.plans.logical.DropTable", "num-children": 1, "child": 0, "ifExists": false, "purge": false } , { "class": "org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier", "num-children": 0, "catalog": null, "identifier": null } ]
More details in referenced issue here: https://github.com/OpenLineage/OpenLineage/issues/2716