Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-4
Description
Atlas needs table type information to correctly build the lineage graph.
Currently this is in the lineage log for a CTAS statement:
{ "queryText": "create table lineage_ctas as select * from lineage_test", "queryId": "774232610e386de9:8111ae3500000000", "hash": "ed91deffcdc11c442c2420da3b33d3b3", "user": "boroknagyz", "timestamp": 1687351038, "endTime": 1687351038, "edges": [ { "sources": [ 1 ], "targets": [ 0 ], "edgeType": "PROJECTION" } ], "vertices": [ { "id": 0, "vertexType": "COLUMN", "vertexId": "i", "metadata": { "tableName": "default.lineage_ctas", "tableCreateTime": 1687351038 } }, { "id": 1, "vertexType": "COLUMN", "vertexId": "default.lineage_test.i", "metadata": { "tableName": "default.lineage_test", "tableCreateTime": 1687351020 } } ] }
Under vertices this is what they'd like to see:
"vertices": [ { "id": 0, "vertexType": "COLUMN", "vertexId": "i", "metadata": { "tableName": "default.lineage_ctas", "tableType": "iceberg", "tableCreateTime": 1687351038 } }, { "id": 1, "vertexType": "COLUMN", "vertexId": "default.lineage_test.i", "metadata": { "tableName": "default.lineage_test", "tableType": "hive", "tableCreateTime": 1687351020 } } ]
So under the vertices' metadata, there should be a new field: 'tableType'. For FS-based tables it should be "hive", except for Iceberg, in which case it should be "iceberg". For Kudu it should be "kudu", and for HBase it should be "hbase".