Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
- Configure Hive hook in Hive.
- Configure Atlas to talk to Kafka topics on a separately set Kafka instance.
- Run a script to create tables in Hive multiple times, for e.g.
for i in `seq 1 10`; do ./bin/hive -e "create table tbl$i (column${i}1 string, column${i}2 int)"; done
- After the script completes, check the number of entities & list the actual entities in the ATLAS_ENTITIES topic.
We can see one ENTITY_CREATE event for the Hive database for every table created in Hive. For e.g.:
{"entity":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference","id":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id","id":"e7ed4ad9-1fba-47ad-a089-1ff7e715c1ad","version":0,"typeName":"hive_db"},"typeName":"hive_db","values":{"name":"default","description":"Default Hive database","ownerType":{"value":"ROLE","ordinal":2},"qualifiedName":"primary.default","locationUri":"hdfs://localhost:9000/user/hive/warehouse","ownerName":"public","clusterName":"primary"},"traitNames":[],"traits":{}},"operationType":"ENTITY_CREATE","traits":[]} {"entity":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference","id":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id","id":"e7ed4ad9-1fba-47ad-a089-1ff7e715c1ad","version":0,"typeName":"hive_db"},"typeName":"hive_db","values":{"name":"default","description":"Default Hive database","ownerType":{"value":"ROLE","ordinal":2},"qualifiedName":"primary.default","locationUri":"hdfs://localhost:9000/user/hive/warehouse","ownerName":"public","clusterName":"primary"},"traitNames":[],"traits":{}},"operationType":"ENTITY_CREATE","traits":[]}
The expectation is that we don't have these extraneous events as there is no change to the entity.
Attachments
Attachments
Issue Links
- links to