Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
It is not uncommon for a Hive deployment to use a large number of staging/temporary tables, which are created periodically to load data into target tables and deleted after completion of data load. A large number of entities are created in Atlas for these staging/temporary tables (tables/columns/column-lineage).
For staging tables, it is probably not useful to track details like columns and column-lineage in Atlas. Not tracking these details in Atlas can significantly reduce the time it takes to process notifications, and can help in improving the performance overall. Only minimum details of these staging tables can be stored in Atlas, to capture data lineage from source to target table via all intermediate staging tables.
Also, it will be helpful to good to ignore tables that are created & deleted during data loading i.e. temporary tables.
Configurations should be provided to specify which of the tables are staging/temporary. In addition to supporting this in Hive hook (to avoid generation of large messages for staging/temporary tables), Atlas server should also be updated, to control this further at server side while processing notifications.
Attachments
Attachments
Issue Links
- contains
-
ATLAS-3085 Provide an option in atlas to disable tracking of specified hive table entities and its lineages
- Resolved
- links to