Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12974

RandomSort not consistent in SolrCloud Mode

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 6.5.1
    • Fix Version/s: None
    • Component/s: SolrCloud
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      Expected behaviour of RandomSort is that given the same random field name (random_<seed>) which acts a seed, the sorting order will remain consistent with the same version of Solr Index.

      From schema.xml:

      <!-- The "RandomSortField" is not used to store or search any data. You can declare fields of this type it in your schema to generate pseudo-random orderings of your docs for sorting or function purposes. The ordering is generated based on the field name and the version of the index. As long as the index version remains unchanged, and the same field name is reused, the ordering of the docs will be consistent. If you want different psuedo-random orderings of documents, for the same version of the index, use a dynamicField and change the field name in the request. -->

       

      In master slave mode, replication happens based on index version. If version number of slave is different than that of master, replication is done by slaves and the index number is updated to match the index version of master.

      However in SolrCloud mode, observation has been that replicas of the same shard do not maintain the same version number at all times even though the documents are same and consistent. 

      This has been previously discussed in mailing list as well.

      SolrCloud works very differently than the old master-slave replication.

      The index is NOT copied from the leader to the other replicas, except
      in extreme recovery circumstances.

      Each replica builds its own copy of the index independently from the
      others. Due to slight timing differences in the indexing operations,
      and possible actions related to transaction log replay on node restart,
      each replica may end up with a different index layout. There also could
      be differences in the number of deleted documents. Unless something
      goes really wrong, all replicas should contain the same live documents.

       

      When a query to a shard is made which has 2 or more replicas, any replica is chosen to respond to the query. Now, if all replicas do not have the same index number, RandomSort will generate random hash seed differently for the same random_<seed> field name.

      In the source code of RandomSort class, in line 86, it mentions the use of index version (of shard) to create random hash seed.

      Hence when querying a Solr Collection, for the same query, Solr is giving different results depending on version mismatch in replicas as well as based on which replica is serving request each time.

       

      Example of Solr Query where random field is being used:

      https://solr-stage.mydomain.com:8983/solr/mycollection/select?wt=json&q=*:*&defType=edismax&fl=id&boost=if(query({!v='documentDate:[2018-11-07 TO *]'}),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),1),sub(1,div(1,1))),if(or(exists(query({!v='documentType:sponsored'})),exists(query({!v='documentType:featured'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),4),sub(1,div(1,4))), if(or(exists(query({!v='documentType:listing'})),exists(query({!v='documentType:promotional'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),2),sub(1,div(1,2))),scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1))))
      

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              shreyshivam Shrey Shivam
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: