Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-3290

Impala Hook should get database name and table name from vertex metadata

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • None
    • atlas-core
    • None

    Description

      The column name in Impala lineage record may not contain its database name and its table name.

      To get its its database name and its table name, we should use the metadata in a vertex, not assuming column name contains its database name and its table name.

      When assuming that column name always contains its database name and its table name, we run into the following exception

      I0618 19:16:02.415920 209817 QueryEventHookManager.java:212] Initiating onQueryComplete: org.apache.atlas.impala.hook.ImpalaLineageHook
      E0618 19:16:02.418964 210738 ImpalaLineageHook.java:126] ImpalaLineageHook.process(): failed to process query create table sales_sg as select * from sales_asia
      Java exception follows:
      java.lang.IllegalArgumentException: fullColumnName {} does not contain database name or table name
              at org.apache.atlas.impala.hook.AtlasImpalaHookContext.getQualifiedNameForColumn(AtlasImpalaHookContext.java:115)
              at org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:164)
              at org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:134)
              at org.apache.atlas.impala.hook.events.BaseImpalaEvent.getColumnEntities(BaseImpalaEvent.java:495)
              at org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:430)
              at org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:393)
              at org.apache.atlas.impala.hook.events.BaseImpalaEvent.toAtlasEntity(BaseImpalaEvent.java:315)
              at org.apache.atlas.impala.hook.events.BaseImpalaEvent.getInputOutputEntity(BaseImpalaEvent.java:297)
              at org.apache.atlas.impala.hook.events.CreateImpalaProcess.getEntities(CreateImpalaProcess.java:103)
              at org.apache.atlas.impala.hook.events.CreateImpalaProcess.getNotificationMessages(CreateImpalaProcess.java:54)
              at org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:122)
              at org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:79)
              at org.apache.atlas.impala.hook.ImpalaHook.onQueryComplete(ImpalaHook.java:36)
              at org.apache.atlas.impala.hook.ImpalaLineageHook.onQueryComplete(ImpalaLineageHook.java:52)
              at org.apache.impala.hooks.QueryEventHookManager.lambda$null$1(QueryEventHookManager.java:215)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      The lineage record from Impala is

      {  
         "queryText":"create table sales_china as select * from sales_asia",
         "queryId":"2940d0b242de53ea:e82ba8d300000000",
         "hash":"a705a9ec851a5440afca0dfb8df86cd5",
         "user":"root",
         "timestamp":1560885032,
         "endTime":1560885040,
         "edges":[  
            {  
               "sources":[  
                  1
               ],
               "targets":[  
                  0
               ],
               "edgeType":"PROJECTION"
            },
            {  
               "sources":[  
                  3
               ],
               "targets":[  
                  2
               ],
               "edgeType":"PROJECTION"
            }
         ],
         "vertices":[  
            {  
               "id":0,
               "vertexType":"COLUMN",
               "vertexId":"id",
               "metadata":{  
                  "tableName":"sales_db.sales_china",
                  "tableCreateTime":1560885039
               }
            },
            {  
               "id":1,
               "vertexType":"COLUMN",
               "vertexId":"sales_db.sales_asia.id",
               "metadata":{  
                  "tableName":"sales_db.sales_asia",
                  "tableCreateTime":1560884919
               }
            },
            {  
               "id":2,
               "vertexType":"COLUMN",
               "vertexId":"name",
               "metadata":{  
                  "tableName":"sales_db.sales_china",
                  "tableCreateTime":1560885039
               }
            },
            {  
               "id":3,
               "vertexType":"COLUMN",
               "vertexId":"sales_db.sales_asia.name",
               "metadata":{  
                  "tableName":"sales_db.sales_asia",
                  "tableCreateTime":1560884919
               }
            }
         ]
      }
      
      

      Attachments

        1. ATLAS-3290.001.patch
          9 kB
          Na Li

        Issue Links

          Activity

            People

              linaataustin Na Li
              linaataustin Na Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: