Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12237

Add information about the table type in the lineage log

    XMLWordPrintableJSON

Details

    • ghx-label-4

    Description

      Atlas needs table type information to correctly build the lineage graph.

      Currently this is in the lineage log for a CTAS statement:

      {
        "queryText": "create table lineage_ctas as select * from lineage_test",
        "queryId": "774232610e386de9:8111ae3500000000",
        "hash": "ed91deffcdc11c442c2420da3b33d3b3",
        "user": "boroknagyz",
        "timestamp": 1687351038,
        "endTime": 1687351038,
        "edges": [
          {
            "sources": [
              1
            ],
            "targets": [
              0
            ],
            "edgeType": "PROJECTION"
          }
        ],
        "vertices": [
          {
            "id": 0,
            "vertexType": "COLUMN",
            "vertexId": "i",
            "metadata": {
              "tableName": "default.lineage_ctas",
              "tableCreateTime": 1687351038
            }
          },
          {
            "id": 1,
            "vertexType": "COLUMN",
            "vertexId": "default.lineage_test.i",
            "metadata": {
              "tableName": "default.lineage_test",
              "tableCreateTime": 1687351020
            }
          }
        ]
      }
      

      Under vertices this is what they'd like to see:

      "vertices": [
          {
            "id": 0,
            "vertexType": "COLUMN",
            "vertexId": "i",
            "metadata": {
              "tableName": "default.lineage_ctas",
              "tableType": "iceberg",
              "tableCreateTime": 1687351038
            }
          },
          {
            "id": 1,
            "vertexType": "COLUMN",
            "vertexId": "default.lineage_test.i",
            "metadata": {
              "tableName": "default.lineage_test",
              "tableType": "hive",         
              "tableCreateTime": 1687351020
            }
          }
        ]
      

      So under the vertices' metadata, there should be a new field: 'tableType'. For FS-based tables it should be "hive", except for Iceberg, in which case it should be "iceberg". For Kudu it should be "kudu", and for HBase it should be "hbase".

      Attachments

        Activity

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: