Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Issue:
Export during ingestion fails giving Found 0 entities in the logs
Ingestion meaning Atlas is consuming messages
Steps to Repro:
- Make sure backend has above 1M entities
- Start creating tables under `db1@cm`
- Start export for `db1@cm`
curl -v -X POST -u admin:admin -H "Content-Type: application/json" "http://<>/api/atlas/admin/export" -d '{"itemsToExport":[{"typeName":"hive_db","uniqueAttributes": { "qualifiedName": "db1@cm" }}],"options":{"fetchType":"full","replicatedTo":"cm"}} > export1.zip
- It fails after sometime.
When is the issue seen?
It occurs when there is huge amount of data in backend and Atlas is consuming messages linked to entity of which export is running
Analysis to find Root cause:
- when there is huge amount of data in backend, export FAILS
- when there is huge amount of data in backend but less tables under it, then also export FAILS
- if background consumption stops, export PASS
- if consumption is of different entities then requested in export, export PASS
- export query to find starting object uses below query, where has clause to check property is expensive
g.V().has('_typeName','hive_db').has('Referenceable.qualifiedName','db6@cm').has('__guid').values('__guid')
- has('__guid') queries solr [(35x_t <> null)]:vertex_index
- below is the timetaken in the solr logs
2024-06-14 02:38:56.218 INFO (qtp1158676965-19) [c:vertex_index s:shard1 r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request [vertex_index_shard1_replica_n1] webapp=/solr path=/select params={q=:&stateVer=vertex_index:12&fl=id&start=0&fq=35x_t:*+&rows=500000&wt=javabin&version=2} hits=1681928 status=0 QTime=4227 2024-06-14 02:40:23.945 INFO (qtp1158676965-16) [c:vertex_index s:shard1 r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request [vertex_index_shard1_replica_n1] webapp=/solr path=/select params={q=:&stateVer=vertex_index:12&fl=id&start=500000&fq=35x_t:*+&rows=500000&wt=javabin&version=2} hits=1682086 status=0 QTime=787 2024-06-14 02:41:37.703 INFO (qtp1158676965-14) [c:vertex_index s:shard1 r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request [vertex_index_shard1_replica_n1] webapp=/solr path=/select params={q=:&stateVer=vertex_index:12&fl=id&start=1000000&fq=35x_t:*+&rows=500000&wt=javabin&version=2} hits=1682216 status=0 QTime=1962 2024-06-14 02:42:20.715 INFO (qtp1158676965-20) [c:vertex_index s:shard1 r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request [vertex_index_shard1_replica_n1] webapp=/solr path=/select params={q=:&stateVer=vertex_index:12&fl=id&start=1500000&fq=35x_t:*+&rows=500000&wt=javabin&version=2} hits=1682363 status=0 QTime=4465
- ran same query through gremlin shell while ingestion is happening it doesn't fail
- time taken for above gremlin query in code when ingestion : 214825ms
- time takem for above gremlin query in gremlin shell when ingestion : 104641ms
- time taken for above gremlin query when no ingestion : 181682ms
Still Root cause is unknown
WorkAround:
- Remove .has('__guid') clause from below, it is very quick and issue is not reproducible.
g.V().has('_typeName','hive_db').has('Referenceable.qualifiedName','db6@cm').has('__guid').values('__guid')
Tests:
- upgrded tinkerpop and janusgraph version but didn't help
- invalid property doesn't throw any exception or not existence of property
Attachments
Issue Links
- links to