Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1880

Performance: Distributed Search should skip GET_FIELDS stage if EXECUTE_QUERY stage gets all fields


    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 4.8, 6.0
    • Component/s: search
    • Labels:


      Right now, a typical distributed search using QueryComponent makes two HTTP requests to each shard:

      1. STAGE_EXECUTE_QUERY executes one HTTP request to each shard to get top N ids and sort keys, merges the results to produce a final list of document IDs (PURPOSE_GET_TOP_IDS).
      2. STAGE_GET_FIELDS executes a second HTTP request to each shard to get the document field values for the final list of document IDs (PURPOSE_GET_FIELDS).

      If the "fl" param is just "id" or just "id,score", all document data to return is already fetched by STAGE_EXECUTE_QUERY. The second STAGE_GET_FIELDS query is completely unnecessary. Eliminating that 2nd HTTP request can make a big difference in overall performance.

      Also, the "fl" param only gets id, score and sort columns, it would probably be cheaper to fetch the final sort column data in STAGE_EXECUTE_QUERY which has to read the sort column data anyway, and skip STAGE_GET_FIELDS.


        1. ASF.LICENSE.NOT.GRANTED--one-pass-query.patch
          4 kB
          Shawn Smith
        2. ASF.LICENSE.NOT.GRANTED--one-pass-query-v1.4.0.patch
          4 kB
          Shawn Smith
        3. SOLR-1880.patch
          10 kB
          Vitaliy Zhovtyuk
        4. SOLR-1880.patch
          10 kB
          Shalin Shekhar Mangar

          Issue Links



              • Assignee:
                shalinmangar Shalin Shekhar Mangar
                ssmith Shawn Smith
              • Votes:
                0 Vote for this issue
                10 Start watching this issue


                • Created: