Solr
  1. Solr
  2. SOLR-8586

Implement hash over all documents to check for shard synchronization

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      An order-independent hash across all of the versions in the index should suffice. The hash itself is pretty easy, but we need to figure out when/where to do this check (for example, I think PeerSync is currently used in multiple contexts and this check would perhaps not be appropriate for all PeerSync calls?)

      1. SOLR-8586.patch
        22 kB
        Yonik Seeley
      2. SOLR-8586.patch
        19 kB
        Yonik Seeley
      3. SOLR-8586.patch
        10 kB
        Yonik Seeley
      4. SOLR-8586.patch
        5 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Ishan Chattopadhyaya added a comment -

          A bloom filter with all versions, maybe?

          Show
          Ishan Chattopadhyaya added a comment - A bloom filter with all versions, maybe?
          Hide
          Yonik Seeley added a comment -

          A bloom filter would allow one to estimate (with a known error) if a specific version is contained within the index. But it's not clear how we would use that info. All we need here is to know if two indexes are in sync or not.

          I was thinking of something as simple as

          h = 0
          for version in versions:
            h += hash(version)
          
          Show
          Yonik Seeley added a comment - A bloom filter would allow one to estimate (with a known error) if a specific version is contained within the index. But it's not clear how we would use that info. All we need here is to know if two indexes are in sync or not. I was thinking of something as simple as h = 0 for version in versions: h += hash(version)
          Hide
          Ishan Chattopadhyaya added a comment - - edited

          I see.. My initial thought was that a bloom filter from one replica could be compared against a bloom filter from another replica (bitwise), to arrive at the same checking. And also, it could be re-used later for other purposes, if needed (maybe to find out a missing update, i.e. by running a loop over all updates one replica has and comparing against the bloom filter of the replica that has missing update; but haven't thought about this usecase carefully enough). However, your logic seems to do the needful and comparing two longs is surely faster than two bit arrays (or two arrays of longs), as in case of a bloom filter.

          Show
          Ishan Chattopadhyaya added a comment - - edited I see.. My initial thought was that a bloom filter from one replica could be compared against a bloom filter from another replica (bitwise), to arrive at the same checking. And also, it could be re-used later for other purposes, if needed (maybe to find out a missing update, i.e. by running a loop over all updates one replica has and comparing against the bloom filter of the replica that has missing update; but haven't thought about this usecase carefully enough). However, your logic seems to do the needful and comparing two longs is surely faster than two bit arrays (or two arrays of longs), as in case of a bloom filter.
          Hide
          Yonik Seeley added a comment -

          My initial thought was that a bloom filter from one replica could be compared against a bloom filter from another replica (bitwise), to arrive at the same checking.

          We'd need to figure out how big of a bloom filter would be needed to avoid a false match (no idea, off the top of my head).

          For adding up good hashes, 64 bits feels like it should be plenty. We could always easily extend that by accumulating in multiple buckets (the bucket being chosen by either a few bits of the hash, or a completely different hash).

          Show
          Yonik Seeley added a comment - My initial thought was that a bloom filter from one replica could be compared against a bloom filter from another replica (bitwise), to arrive at the same checking. We'd need to figure out how big of a bloom filter would be needed to avoid a false match (no idea, off the top of my head). For adding up good hashes, 64 bits feels like it should be plenty. We could always easily extend that by accumulating in multiple buckets (the bucket being chosen by either a few bits of the hash, or a completely different hash).
          Hide
          Yonik Seeley added a comment -

          Here's a draft patch that implements the hash.
          Still needs cleanups, code that calls it in the right place, tests, etc.

          Show
          Yonik Seeley added a comment - Here's a draft patch that implements the hash. Still needs cleanups, code that calls it in the right place, tests, etc.
          Hide
          David Smiley added a comment -

          Could you please clarify what this issue is all about? I don't get it.

          Show
          David Smiley added a comment - Could you please clarify what this issue is all about? I don't get it.
          Hide
          Yonik Seeley added a comment -

          Could you please clarify what this issue is all about? I don't get it.

          Are you familiar with PeerSync? I just linked SOLR-8129 as well.

          PeerSync currently checks for replicas being in-sync by looking at the last 100 updates, and if there are only a few updates missing (judged by a sufficient overlap of those updates) it will grab the missing updates from the peer and then assume that it is in sync. For whatever reason, updates can sometimes get wildly reordered, and looking at the last N updates is not sufficient. Hopefully "Implement hash over all documents to check for shard synchronization" should now make sense?

          Show
          Yonik Seeley added a comment - Could you please clarify what this issue is all about? I don't get it. Are you familiar with PeerSync? I just linked SOLR-8129 as well. PeerSync currently checks for replicas being in-sync by looking at the last 100 updates, and if there are only a few updates missing (judged by a sufficient overlap of those updates) it will grab the missing updates from the peer and then assume that it is in sync. For whatever reason, updates can sometimes get wildly reordered, and looking at the last N updates is not sufficient. Hopefully "Implement hash over all documents to check for shard synchronization" should now make sense?
          Hide
          David Smiley added a comment -

          Ah, ok. I wasn't familiar with PeerSync; thanks for educating me.

          I wonder if adding hashes of the version might be prone to problems if the version of any given document tends to be identical to many other documents if they were added at once, and assuming a timestamp based version. Just throwing that out there; maybe it wouldn't be a problem and/or too unlikely to worry about, all things considered.

          Show
          David Smiley added a comment - Ah, ok. I wasn't familiar with PeerSync; thanks for educating me. I wonder if adding hashes of the version might be prone to problems if the version of any given document tends to be identical to many other documents if they were added at once, and assuming a timestamp based version. Just throwing that out there; maybe it wouldn't be a problem and/or too unlikely to worry about, all things considered.
          Hide
          Yonik Seeley added a comment -

          I wonder if adding hashes of the version might be prone to problems if the version of any given document tends to be identical to many other documents

          Nope, versions are unique to a shard (the leader assigns a unique version to every update).

          Show
          Yonik Seeley added a comment - I wonder if adding hashes of the version might be prone to problems if the version of any given document tends to be identical to many other documents Nope, versions are unique to a shard (the leader assigns a unique version to every update).
          Hide
          Stephan Lagraulet added a comment -

          Would it be possible to increase this "100 updates" window as it seems quite low for heavy indexing use cases ?

          Show
          Stephan Lagraulet added a comment - Would it be possible to increase this "100 updates" window as it seems quite low for heavy indexing use cases ?
          Show
          Ishan Chattopadhyaya added a comment - numRecordsToKeep can be configured. https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-TransactionLog
          Hide
          Ishan Chattopadhyaya added a comment -

          Just want to understand that those updates (at a replica) which are rejected due to reordering, or older versions which have since been updated, would also be counted towards this hash, isn't it?
          Or, instead, would the fingerprint be the sum of hashes of only the latest versions of all docs?

          Show
          Ishan Chattopadhyaya added a comment - Just want to understand that those updates (at a replica) which are rejected due to reordering, or older versions which have since been updated, would also be counted towards this hash, isn't it? Or, instead, would the fingerprint be the sum of hashes of only the latest versions of all docs?
          Hide
          Stephan Lagraulet added a comment -

          Thanks I missed this Solr5 enhancement...

          Show
          Stephan Lagraulet added a comment - Thanks I missed this Solr5 enhancement...
          Hide
          Yonik Seeley added a comment -

          The latter. We're looking at that is in the index, and that will only have the last version of every non-deleted document.

          Show
          Yonik Seeley added a comment - The latter. We're looking at that is in the index, and that will only have the last version of every non-deleted document.
          Hide
          Yonik Seeley added a comment -

          Here's an updated patch that implements fingerprinting on the request side.

          Example:

          $curl "http://localhost:8983/solr/techproducts/get?getVersions=5&fingerprint=true"
          {
            "versions":[1524634308552163328,
              1524634308550066176,
              1524634308544823296,
              1524634308538531840,
              1524634308533288960],
            "fingerprint":{
              "maxVersionSpecified":9223372036854775807,
              "maxVersionEncountered":1524634308552163328,
              "maxInHash":1524634308552163328,
              "versionsHash":1830505675324363667,
              "numVersions":32,
              "numDocs":32,
              "maxDoc":32}}
          
          Show
          Yonik Seeley added a comment - Here's an updated patch that implements fingerprinting on the request side. Example: $curl "http: //localhost:8983/solr/techproducts/get?getVersions=5&fingerprint= true " { "versions" :[1524634308552163328, 1524634308550066176, 1524634308544823296, 1524634308538531840, 1524634308533288960], "fingerprint" :{ "maxVersionSpecified" :9223372036854775807, "maxVersionEncountered" :1524634308552163328, "maxInHash" :1524634308552163328, "versionsHash" :1830505675324363667, "numVersions" :32, "numDocs" :32, "maxDoc" :32}}
          Hide
          Yonik Seeley added a comment -

          So the basic idea is that when a replica coming back up syncs to a leader, it can request a fingerprint in addition to the last leader versions. It can then grab and apply any missing versions, calculate it's own fingerprint, and compare for equality.

          Show
          Yonik Seeley added a comment - So the basic idea is that when a replica coming back up syncs to a leader, it can request a fingerprint in addition to the last leader versions. It can then grab and apply any missing versions, calculate it's own fingerprint, and compare for equality.
          Hide
          Yonik Seeley added a comment -

          PeerSync always returned "true" if the core doing the sync was judged to be either equal to or ahead of the remote core.
          So one outstanding question is: under what circumstances do we change this to only return true on an exact match?

          Show
          Yonik Seeley added a comment - PeerSync always returned "true" if the core doing the sync was judged to be either equal to or ahead of the remote core. So one outstanding question is: under what circumstances do we change this to only return true on an exact match?
          Hide
          David Smith added a comment -

          Trying to understand this without knowing the internals of the sync process – apologies in advance if these are dumb questions:

          It isn't stated, but I assume the replica does a full sync if its fingerprint, after sync, does not match the leader's?

          Are there any scale concerns around calculating the fingerprint? Say, if there are 100,000,000 (non-deleted) docs in the index?

          In a high volume situation (1000's updates / sec), will the leader's fingerprint calculation be in perfect sync with the last versions it is communicating to the replica? Thinking about a searcher being refreshed in the middle of this request, or something like that.

          Show
          David Smith added a comment - Trying to understand this without knowing the internals of the sync process – apologies in advance if these are dumb questions: It isn't stated, but I assume the replica does a full sync if its fingerprint, after sync, does not match the leader's? Are there any scale concerns around calculating the fingerprint? Say, if there are 100,000,000 (non-deleted) docs in the index? In a high volume situation (1000's updates / sec), will the leader's fingerprint calculation be in perfect sync with the last versions it is communicating to the replica? Thinking about a searcher being refreshed in the middle of this request, or something like that.
          Hide
          Ishan Chattopadhyaya added a comment -

          The approach looks good to me.

          +    // TODO: this could be parallelized, or even cached per-segment if performance becomes an issue
          

          I am thinking if per-segment caching would conflict with any potential for in-place docValues updates support (SOLR-5944)? I'm saying this based on my assumption that docValues updates re-writes the docValues file for a previously written segment. Given that, in such a case, version field would be a DV field, would per-segment caching of the fingerprint need to be aware of in-place updates within a segment (whenever that support is built)?

          Show
          Ishan Chattopadhyaya added a comment - The approach looks good to me. + // TODO: this could be parallelized, or even cached per-segment if performance becomes an issue I am thinking if per-segment caching would conflict with any potential for in-place docValues updates support ( SOLR-5944 )? I'm saying this based on my assumption that docValues updates re-writes the docValues file for a previously written segment. Given that, in such a case, version field would be a DV field, would per-segment caching of the fingerprint need to be aware of in-place updates within a segment (whenever that support is built)?
          Hide
          Yonik Seeley added a comment -

          I am thinking if per-segment caching would conflict with any potential for in-place docValues updates

          Hmmm, excellent thought.
          Previously, if caching by the "core" segment key, one only needed to take into account deletions. In this case we could have just subtracted the hash for each deletion to do per-segment caching. But I don't know how this works with updateable doc values. They may invalidate previous techniques for per-segment caching (for those fields only of course).

          Show
          Yonik Seeley added a comment - I am thinking if per-segment caching would conflict with any potential for in-place docValues updates Hmmm, excellent thought. Previously, if caching by the "core" segment key, one only needed to take into account deletions. In this case we could have just subtracted the hash for each deletion to do per-segment caching. But I don't know how this works with updateable doc values. They may invalidate previous techniques for per-segment caching (for those fields only of course).
          Hide
          Yonik Seeley added a comment -

          It isn't stated, but I assume the replica does a full sync if its fingerprint, after sync, does not match the leader's?

          right.

          Are there any scale concerns around calculating the fingerprint? Say, if there are 100,000,000 (non-deleted) docs in the index?

          Yes, this needs to be tested. We can do some caching if it's an issue.

          In a high volume situation (1000's updates / sec), will the leader's fingerprint calculation be in perfect sync with the last versions it is communicating to the replica?

          No, but in a high volume situation, we won't be able to sync up by requesting a few missed docs from the leader anyway, so it probably doesn't matter. This is more for both low update scenarios, and for bringing the whole cluster back up.

          Show
          Yonik Seeley added a comment - It isn't stated, but I assume the replica does a full sync if its fingerprint, after sync, does not match the leader's? right. Are there any scale concerns around calculating the fingerprint? Say, if there are 100,000,000 (non-deleted) docs in the index? Yes, this needs to be tested. We can do some caching if it's an issue. In a high volume situation (1000's updates / sec), will the leader's fingerprint calculation be in perfect sync with the last versions it is communicating to the replica? No, but in a high volume situation, we won't be able to sync up by requesting a few missed docs from the leader anyway, so it probably doesn't matter. This is more for both low update scenarios, and for bringing the whole cluster back up.
          Hide
          Joel Bernstein added a comment - - edited

          This is exactly what we need for implementing alerts (SOLR-8577).

          Show
          Joel Bernstein added a comment - - edited This is exactly what we need for implementing alerts ( SOLR-8577 ).
          Hide
          Stephan Lagraulet added a comment -

          I'm trying to gather all issues related to SolrCloud that affects Solr 5.4. Can you affect SolrCloud component to this issue ?

          Show
          Stephan Lagraulet added a comment - I'm trying to gather all issues related to SolrCloud that affects Solr 5.4. Can you affect SolrCloud component to this issue ?
          Hide
          Yonik Seeley added a comment -

          PeerSync always returned "true" if the core doing the sync was judged to be either equal to or ahead of the remote core.
          So one outstanding question is: under what circumstances do we change this to only return true on an exact match?

          So I think the answer to this is that we're OK, as long as both peers don't end up returning true.

          Show
          Yonik Seeley added a comment - PeerSync always returned "true" if the core doing the sync was judged to be either equal to or ahead of the remote core. So one outstanding question is: under what circumstances do we change this to only return true on an exact match? So I think the answer to this is that we're OK, as long as both peers don't end up returning true.
          Hide
          Yonik Seeley added a comment -

          OK, code is pretty much done I think... just needs tests now.
          I didn't change the strategy of any of the code that uses peersync. fingerprinting is on by default, except in SyncStrategy.syncWithReplicas where it is false (this is the leader syncing with it's replicas, and nothing is done with failures in any case).

          Show
          Yonik Seeley added a comment - OK, code is pretty much done I think... just needs tests now. I didn't change the strategy of any of the code that uses peersync. fingerprinting is on by default, except in SyncStrategy.syncWithReplicas where it is false (this is the leader syncing with it's replicas, and nothing is done with failures in any case).
          Hide
          Yonik Seeley added a comment -

          OK, here's hopefully the complete patch + additional PeerSync tests.

          Show
          Yonik Seeley added a comment - OK, here's hopefully the complete patch + additional PeerSync tests.
          Hide
          Mark Miller added a comment -

          Any chaos monkey test results yet?

          Show
          Mark Miller added a comment - Any chaos monkey test results yet?
          Hide
          Yonik Seeley added a comment -

          Yep, I've been looping a custom version of the HDFS-nothing-safe test that among other things, only does adds, no deletes. It's the same test I've been using all along in SOLR-8129 . I've gotten 66 fails (most due to mismatch with control), but no fails due to shards being out of sync!

          I plan on committing this soon.

          Show
          Yonik Seeley added a comment - Yep, I've been looping a custom version of the HDFS-nothing-safe test that among other things, only does adds, no deletes. It's the same test I've been using all along in SOLR-8129 . I've gotten 66 fails (most due to mismatch with control), but no fails due to shards being out of sync! I plan on committing this soon.
          Hide
          ASF subversion and git services added a comment -

          Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch refs/heads/master from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ]

          SOLR-8586: add index fingerprinting and use it in peersync

          Show
          ASF subversion and git services added a comment - Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ] SOLR-8586 : add index fingerprinting and use it in peersync
          Hide
          Erick Erickson added a comment -

          OK, does this mean I can commit SOLR-8500 (after this is committed to 5x)?

          Show
          Erick Erickson added a comment - OK, does this mean I can commit SOLR-8500 (after this is committed to 5x)?
          Hide
          ASF subversion and git services added a comment -

          Commit f6400e9cbb1158178af0b6cb7901a784368ab589 in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6400e9 ]

          SOLR-8586: Fix forbidden APIS; cleanup of imports

          Show
          ASF subversion and git services added a comment - Commit f6400e9cbb1158178af0b6cb7901a784368ab589 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6400e9 ] SOLR-8586 : Fix forbidden APIS; cleanup of imports
          Hide
          Mark Miller added a comment -

          I think that we should warn that it can result in more often needing to do full index replication for recovery, but I have nothing against it.

          Show
          Mark Miller added a comment - I think that we should warn that it can result in more often needing to do full index replication for recovery, but I have nothing against it.
          Hide
          Erick Erickson added a comment -

          Yeah, this is kind of a "use at your own risk in very specialized situations" kind of thing so I'll be sure and include that warning.

          Show
          Erick Erickson added a comment - Yeah, this is kind of a "use at your own risk in very specialized situations" kind of thing so I'll be sure and include that warning.
          Hide
          ASF subversion and git services added a comment -

          Commit ff83a400156beb6a8dd2d0845c7f878c28431739 in lucene-solr's branch refs/heads/branch_5x from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff83a40 ]

          SOLR-8586: add index fingerprinting and use it in peersync
          (cherry picked from commit 629767b)

          Show
          ASF subversion and git services added a comment - Commit ff83a400156beb6a8dd2d0845c7f878c28431739 in lucene-solr's branch refs/heads/branch_5x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff83a40 ] SOLR-8586 : add index fingerprinting and use it in peersync (cherry picked from commit 629767b)
          Hide
          ASF subversion and git services added a comment -

          Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch refs/heads/lucene-6997 from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ]

          SOLR-8586: add index fingerprinting and use it in peersync

          Show
          ASF subversion and git services added a comment - Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch refs/heads/lucene-6997 from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ] SOLR-8586 : add index fingerprinting and use it in peersync
          Hide
          ASF subversion and git services added a comment -

          Commit d75abb2539fb62514c506776c1db6182803745bc in lucene-solr's branch refs/heads/branch_5x from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d75abb2 ]

          SOLR-8586: Fix forbidden APIS; cleanup of imports

          Show
          ASF subversion and git services added a comment - Commit d75abb2539fb62514c506776c1db6182803745bc in lucene-solr's branch refs/heads/branch_5x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d75abb2 ] SOLR-8586 : Fix forbidden APIS; cleanup of imports
          Hide
          ASF subversion and git services added a comment -

          Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch refs/heads/lucene-6835 from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ]

          SOLR-8586: add index fingerprinting and use it in peersync

          Show
          ASF subversion and git services added a comment - Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch refs/heads/lucene-6835 from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ] SOLR-8586 : add index fingerprinting and use it in peersync
          Hide
          ASF subversion and git services added a comment -

          Commit f6400e9cbb1158178af0b6cb7901a784368ab589 in lucene-solr's branch refs/heads/lucene-6835 from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6400e9 ]

          SOLR-8586: Fix forbidden APIS; cleanup of imports

          Show
          ASF subversion and git services added a comment - Commit f6400e9cbb1158178af0b6cb7901a784368ab589 in lucene-solr's branch refs/heads/lucene-6835 from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6400e9 ] SOLR-8586 : Fix forbidden APIS; cleanup of imports
          Hide
          Joel Bernstein added a comment -

          Now that this is in place it may make sense to combine this with Streaming. The first thing I see is to compare hashes between the shards and if there is a difference use the ComplementStream to determine which id's are missing. The missing id's could then be automatically fetched from the source and re-indexed. There could be a DaemonStream that lives inside the collection that performs this check periodically. This could also sort out a situation where non of the shards have the complete truth.

          Show
          Joel Bernstein added a comment - Now that this is in place it may make sense to combine this with Streaming. The first thing I see is to compare hashes between the shards and if there is a difference use the ComplementStream to determine which id's are missing. The missing id's could then be automatically fetched from the source and re-indexed. There could be a DaemonStream that lives inside the collection that performs this check periodically. This could also sort out a situation where non of the shards have the complete truth.
          Hide
          Yonik Seeley added a comment -

          OK, I did some basic performance testing...
          On an index w/ 5M docs, the first-time fingerprint took 1100ms (most of that time was un-inversion of the version field, which did not use docValues).
          After the first time, subsequent fingerprints took ~55ms

          Show
          Yonik Seeley added a comment - OK, I did some basic performance testing... On an index w/ 5M docs, the first-time fingerprint took 1100ms (most of that time was un-inversion of the version field, which did not use docValues). After the first time, subsequent fingerprints took ~55ms
          Hide
          Yonik Seeley added a comment -

          The first thing I see is to compare hashes between the shards and if there is a difference use the ComplementStream to determine which id's are missing.

          Implementing eventual consistency with this is problematic in a general sense:
          If one shard has an ID and another doesn't, you don't know what the correct state is.
          The other general issue is the inability to actually retrieve an arbitrary document from the index (i.e. all source fields must be stored).

          It may still be useful for add-only systems that do store all source fields... but in that case, we could make things much more efficient by adding in the ability to use hash trees to drastically narrow the ids that need to be communicated.

          Show
          Yonik Seeley added a comment - The first thing I see is to compare hashes between the shards and if there is a difference use the ComplementStream to determine which id's are missing. Implementing eventual consistency with this is problematic in a general sense: If one shard has an ID and another doesn't, you don't know what the correct state is. The other general issue is the inability to actually retrieve an arbitrary document from the index (i.e. all source fields must be stored). It may still be useful for add-only systems that do store all source fields... but in that case, we could make things much more efficient by adding in the ability to use hash trees to drastically narrow the ids that need to be communicated.
          Hide
          Joel Bernstein added a comment -

          I think there would need to be a system of truth involved, which there often is. The steps would be:

          1) Check the hashes.
          2) If hashes differ find the difference in id's.
          3) Refetch Id's from the system of truth. Streaming data from the system of truth is easily done with streams like the JdbcStream which streams data from a relational database.

          Show
          Joel Bernstein added a comment - I think there would need to be a system of truth involved, which there often is. The steps would be: 1) Check the hashes. 2) If hashes differ find the difference in id's. 3) Refetch Id's from the system of truth. Streaming data from the system of truth is easily done with streams like the JdbcStream which streams data from a relational database.
          Hide
          Yonik Seeley added a comment - - edited

          Yep, I've been looping a custom version of the HDFS-nothing-safe test that among other things, only does adds, no deletes.

          Update: when I reverted my custom changes to the chaos test (so that it also did deletes), I got a high amount of shard-out-of-sync errors... seemingly even more than before, so I've been trying to track those down. What I saw were issues that did not look related to PeerSync... I saw missing documents from a shard that replicated from the leader while buffering documents, and I saw the missing documents come in and get buffered, pointing to transaction log buffering or replay issues.

          Then I realized that I had tested "adds only" before committing, and tested the normal test after committing and doing a "git pull". In-between those times was SOLR-8575, which was a fix to the HDFS tlog! I've been looping the test for a number of hours with those changes reverted, and I haven't seen a shards-out-of-sync fail so far. I've also done a quick review of SOLR-8575, but didn't see anything obviously incorrect. The changes in that issue may just be uncovering another bug (due to timing) rather than causing one... too early to tell.

          I've also been running the non-hdfs version of the test for over a day, and also had no inconsistent shard failures.

          Show
          Yonik Seeley added a comment - - edited Yep, I've been looping a custom version of the HDFS-nothing-safe test that among other things, only does adds, no deletes. Update: when I reverted my custom changes to the chaos test (so that it also did deletes), I got a high amount of shard-out-of-sync errors... seemingly even more than before, so I've been trying to track those down. What I saw were issues that did not look related to PeerSync... I saw missing documents from a shard that replicated from the leader while buffering documents, and I saw the missing documents come in and get buffered, pointing to transaction log buffering or replay issues. Then I realized that I had tested "adds only" before committing, and tested the normal test after committing and doing a "git pull". In-between those times was SOLR-8575 , which was a fix to the HDFS tlog! I've been looping the test for a number of hours with those changes reverted, and I haven't seen a shards-out-of-sync fail so far. I've also done a quick review of SOLR-8575 , but didn't see anything obviously incorrect. The changes in that issue may just be uncovering another bug (due to timing) rather than causing one... too early to tell. I've also been running the non-hdfs version of the test for over a day, and also had no inconsistent shard failures.
          Hide
          Yago Riveiro added a comment -

          Is this operation memory bound?

          I'm trying to update my SolrCloud from 5.4 to 5.5.2 and I can only update one node, if I start another node with 5.5.2 the first dies with an OOM.

          The second node never pass the phase where is checking if replicas are sync.

          The SolrCloud deploy (2 nodes) has no activity at all, is a cold repository for archived data (around 5 Billion documents).

          Show
          Yago Riveiro added a comment - Is this operation memory bound? I'm trying to update my SolrCloud from 5.4 to 5.5.2 and I can only update one node, if I start another node with 5.5.2 the first dies with an OOM. The second node never pass the phase where is checking if replicas are sync. The SolrCloud deploy (2 nodes) has no activity at all, is a cold repository for archived data (around 5 Billion documents).
          Hide
          Yonik Seeley added a comment - - edited

          If the version field doesn't have docValues, then it will be un-inverted (i.e. FieldCache entries will be built to support version lookups, and that does require memory).
          Since version lookups are needed in the course of indexing anyway (to detect update reorders on replicas), this should really just change when these FieldCache entries are created... hence the maximum required amount of memory shouldn't be changed.

          Show
          Yonik Seeley added a comment - - edited If the version field doesn't have docValues, then it will be un-inverted (i.e. FieldCache entries will be built to support version lookups, and that does require memory). Since version lookups are needed in the course of indexing anyway (to detect update reorders on replicas), this should really just change when these FieldCache entries are created... hence the maximum required amount of memory shouldn't be changed.
          Hide
          Yago Riveiro added a comment - - edited

          My index has 12T of data indexed with 4.0, the version field only supports docValues since 4.7.

          To Upgrade to 5.x I ran the lucene-core-5.x over all my data,but with this new feature I need to re-index all my data because I don't have docValues for _version_ field and this feature use instead the un-inverted method that creates a memory struct that doesn't fit the memory of my servers ...

          To be honest, this never should be done in a minor release ... this mandatory feature is based in a optional configuration :/

          I will die in 5.4 or spend several months re-indexing data and figure out how to update production without downtime. Not an easy task.

          Show
          Yago Riveiro added a comment - - edited My index has 12T of data indexed with 4.0, the version field only supports docValues since 4.7. To Upgrade to 5.x I ran the lucene-core-5.x over all my data,but with this new feature I need to re-index all my data because I don't have docValues for _ version _ field and this feature use instead the un-inverted method that creates a memory struct that doesn't fit the memory of my servers ... To be honest, this never should be done in a minor release ... this mandatory feature is based in a optional configuration :/ I will die in 5.4 or spend several months re-indexing data and figure out how to update production without downtime. Not an easy task.
          Hide
          Yonik Seeley added a comment -

          You can set the environment variable solr.disableFingerprint to "false" to disable the fingerprint check.

          If your indexes ever have updates to existing documents, then you're still risking OOMs anyway (the first time a replica detects that an update may be reordered will cause the FieldCache to be populated for version for that segment). The fingerprint makes that happen up-front (what I meant to say in my previous message was "the maximum required amount of memory shouldn't be changed").

          Show
          Yonik Seeley added a comment - You can set the environment variable solr.disableFingerprint to "false" to disable the fingerprint check. If your indexes ever have updates to existing documents, then you're still risking OOMs anyway (the first time a replica detects that an update may be reordered will cause the FieldCache to be populated for version for that segment). The fingerprint makes that happen up-front (what I meant to say in my previous message was "the maximum required amount of memory shouldn't be changed").
          Hide
          Yago Riveiro added a comment -

          Then I do not understand, how this is possible:

          https://www.dropbox.com/s/a6e2wrmedop7xjv/Screenshot%202016-08-12%2018.19.22.png?dl=0

          Only with 5.5.x and 6.x the heap grows to the infinite. Rolling back to 5.4 the amount of memory needed to become up is constant ...

          With only one node running 5.5.x I have no problems, when I start a second node with 5.5.x they never pass the phase where they are checking replica synchronization.

          Show
          Yago Riveiro added a comment - Then I do not understand, how this is possible: https://www.dropbox.com/s/a6e2wrmedop7xjv/Screenshot%202016-08-12%2018.19.22.png?dl=0 Only with 5.5.x and 6.x the heap grows to the infinite. Rolling back to 5.4 the amount of memory needed to become up is constant ... With only one node running 5.5.x I have no problems, when I start a second node with 5.5.x they never pass the phase where they are checking replica synchronization.

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development