Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-1818

Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch Results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8-incubating, trunk
    • 0.8.1, 1.0.0
    • atlas-core, atlas-webui
    • None

    Description

      Background

      An environment that is setup with 100K hive_tables each with 84 columns.

      The basic search with query parameter specified is executed. Results take 75 secs to appear.

      Analysis & Findings

      Similar test was performed with smaller data set (200 hive_tables each with 81 columns) resulted in less than ideal performance.

      Atlas Basic Search API uses graph.indexQuery for performing search. This uses Solr for doing the search.

      There are 2 aspects that affect performance:

      • Solr's default for returning max query set when no limit is specified is 100K. In the test scenario, this is returning entire result set.
      • Once result set is returned, EntityDiscoveryService.searchUsingBasicQuery does a sequential scan to filter data relevant to the query. This operation is proportional to size of the result set.

      Solution

      Following changes will improve performance:

      • Solr's max result set property is governed by atlas.graph.index.search.max-result-set-size. It will make sense to set this to a lower number.
      • Modify Solr's configuration solrconfig.xml to use FastLRUCache.
      • Modify EntityDiscoveryService.searchUsingBasicQuery to form a query that takes additional parameters.

      Attachments

        1. ATLAS-1818-4.patch
          31 kB
          Ashutosh Mestry

        Issue Links

          Activity

            People

              amestry Ashutosh Mestry
              amestry Ashutosh Mestry
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 120h
                  120h
                  Remaining:
                  Time Spent - 96h Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - 96h Remaining Estimate - 24h
                  96h