Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1880

Performance: Distributed Search should skip GET_FIELDS stage if EXECUTE_QUERY stage gets all fields

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4
    • 4.8, 6.0
    • search
    • None

    Description

      Right now, a typical distributed search using QueryComponent makes two HTTP requests to each shard:

      1. STAGE_EXECUTE_QUERY executes one HTTP request to each shard to get top N ids and sort keys, merges the results to produce a final list of document IDs (PURPOSE_GET_TOP_IDS).
      2. STAGE_GET_FIELDS executes a second HTTP request to each shard to get the document field values for the final list of document IDs (PURPOSE_GET_FIELDS).

      If the "fl" param is just "id" or just "id,score", all document data to return is already fetched by STAGE_EXECUTE_QUERY. The second STAGE_GET_FIELDS query is completely unnecessary. Eliminating that 2nd HTTP request can make a big difference in overall performance.

      Also, the "fl" param only gets id, score and sort columns, it would probably be cheaper to fetch the final sort column data in STAGE_EXECUTE_QUERY which has to read the sort column data anyway, and skip STAGE_GET_FIELDS.

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--one-pass-query.patch
          4 kB
          Shawn Smith
        2. ASF.LICENSE.NOT.GRANTED--one-pass-query-v1.4.0.patch
          4 kB
          Shawn Smith
        3. SOLR-1880.patch
          10 kB
          Shalin Shekhar Mangar
        4. SOLR-1880.patch
          10 kB
          Vitaliy Zhovtyuk

        Issue Links

          Activity

            People

              shalin Shalin Shekhar Mangar
              ssmith Shawn Smith
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: