[ATLAS-1818] Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch Results - ASF JIRA

XML

Word

Printable

JSON

An environment that is setup with 100K hive_tables each with 84 columns.

The basic search with query parameter specified is executed. Results take 75 secs to appear.

Similar test was performed with smaller data set (200 hive_tables each with 81 columns) resulted in less than ideal performance.

Atlas Basic Search API uses graph.indexQuery for performing search. This uses Solr for doing the search.

There are 2 aspects that affect performance:

Solr's default for returning max query set when no limit is specified is 100K. In the test scenario, this is returning entire result set.
Once result set is returned, EntityDiscoveryService.searchUsingBasicQuery does a sequential scan to filter data relevant to the query. This operation is proportional to size of the result set.

Following changes will improve performance:

Solr's max result set property is governed by atlas.graph.index.search.max-result-set-size. It will make sense to set this to a lower number.
Modify Solr's configuration solrconfig.xml to use FastLRUCache.
Modify EntityDiscoveryService.searchUsingBasicQuery to form a query that takes additional parameters.

links to

Code Review

Estimated:

120h

Remaining:

24h

Logged:

96h