Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9506

cache IndexFingerprint for each segment

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.3
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      The IndexFingerprint is cached per index searcher. it is quite useless during high throughput indexing. If the fingerprint is cached per segment it will make it vastly more efficient to compute the fingerprint

      1. SOLR-9506-combined-deletion-key.patch
        1 kB
        Ishan Chattopadhyaya
      2. SOLR-9506.patch
        20 kB
        Pushkar Raste
      3. SOLR-9506.patch
        21 kB
        Pushkar Raste
      4. SOLR-9506.patch
        26 kB
        Pushkar Raste
      5. SOLR-9506.patch
        27 kB
        Pushkar Raste
      6. SOLR-9506_POC.patch
        2 kB
        Noble Paul
      7. SOLR-9506_final.patch
        26 kB
        Pushkar Raste

        Activity

        Hide
        praste Pushkar Raste added a comment - - edited

        POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

        There are two issues we still need to solve.

        • How to compute versionsInHash from versionsInHash of individual segments. We can not use current versionsHash (unless we cache all the individual version numbers), as it is not additive. Consider following scenario
          Leader segments, versions and versionsHash
          seg1 :
          versions: 100, 101, 102
          versionHash: hash(100) + hash(101) + hash(102)
          seg2:
          versions: 103, 104, 105
          versionHash: hash(103) + hash(104) + hash(105)

          Replica segments, versions and hash
          seg1:
          versions: 100, 101
          versionHash: hash(100) + hash(101)
          seg2:
          versions: 102, 103, 104, 105
          versionHash: hash(102) + hash(103) + hash(104) + hash(105)

          Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative versionHash of leader and replica would match.

          Even if decide not to cache IndexFingerprint per segment but just to parallalize the computation, I think we still would run into issue mentioned above.
        • I still need to figure out how to keep cache in DefaultSolrCoreState, so that we can reuse IndexFingerprint of individual segments when a new Searcher is opened.
        Show
        praste Pushkar Raste added a comment - - edited POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13 There are two issues we still need to solve. How to compute versionsInHash from versionsInHash of individual segments. We can not use current versionsHash (unless we cache all the individual version numbers), as it is not additive. Consider following scenario Leader segments, versions and versionsHash seg1 : versions: 100, 101, 102 versionHash: hash(100) + hash(101) + hash(102) seg2 : versions: 103, 104, 105 versionHash: hash(103) + hash(104) + hash(105) Replica segments, versions and hash seg1 : versions: 100, 101 versionHash: hash(100) + hash(101) seg2 : versions: 102, 103, 104, 105 versionHash: hash(102) + hash(103) + hash(104) + hash(105) Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative versionHash of leader and replica would match. Even if decide not to cache IndexFingerprint per segment but just to parallalize the computation, I think we still would run into issue mentioned above. I still need to figure out how to keep cache in DefaultSolrCoreState , so that we can reuse IndexFingerprint of individual segments when a new Searcher is opened.
        Hide
        noble.paul Noble Paul added a comment -

        Pushkar Raste I've attached a sample program which computes versionsHash for leader and replica using the above example

        Show
        noble.paul Noble Paul added a comment - Pushkar Raste I've attached a sample program which computes versionsHash for leader and replica using the above example
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        We should keep in mind that previously written segments can change if there are deletes. Maybe we should recompute the per-segment fingerprints upon deletion in that segment.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - We should keep in mind that previously written segments can change if there are deletes. Maybe we should recompute the per-segment fingerprints upon deletion in that segment.
        Hide
        noble.paul Noble Paul added a comment - - edited

        no. segments don't change. you mean fingerprints can change?

        Show
        noble.paul Noble Paul added a comment - - edited no. segments don't change. you mean fingerprints can change?
        Hide
        praste Pushkar Raste added a comment -

        I think what Ishan Chattopadhyaya is hinting at, is that if numDocs account only for live (active) docs, then once documents are deleted in a segment, numDocs in the cached fingerprint might be wrong.

        Surprising, following test cases passed with my POC
        1. PeerSyncTest
        2. PeerSyncReplicationTest
        3. SyncSliceTest

        In the worst case, we can atleast parallalize fingerprint computation.

        Show
        praste Pushkar Raste added a comment - I think what Ishan Chattopadhyaya is hinting at, is that if numDocs account only for live (active) docs, then once documents are deleted in a segment, numDocs in the cached fingerprint might be wrong. Surprising, following test cases passed with my POC 1. PeerSyncTest 2. PeerSyncReplicationTest 3. SyncSliceTest In the worst case, we can atleast parallalize fingerprint computation.
        Hide
        praste Pushkar Raste added a comment -

        Adding Yonik Seeley in the loop

        Show
        praste Pushkar Raste added a comment - Adding Yonik Seeley in the loop
        Hide
        noble.paul Noble Paul added a comment -

        the cumulative numDocs will be same anyway

        I guess it can be reproduced as follows

        1. take a 2 replica shard
        2. index and commit multiple times
        3. delete one doc and commit
        4. bring down replica
        5. optimize leader
        6. bring up replica

        I guess this will lead to a full replication

        Show
        noble.paul Noble Paul added a comment - the cumulative numDocs will be same anyway I guess it can be reproduced as follows take a 2 replica shard index and commit multiple times delete one doc and commit bring down replica optimize leader bring up replica I guess this will lead to a full replication
        Hide
        noble.paul Noble Paul added a comment - - edited

        Few quick points

          // Map is not concurrent, but since computeIfAbsent is idempotent, it should be alright for two threads to compute values for the same key.   
          private final Map<LeafReaderContext, Map<Long, IndexFingerprint>> perSegmentFingerprintCache = new WeakHashMap<>();
        

        It's idempotent, but the map has to be thread safe, javadocs say that threadsafety depends on the map implementation
        We really don't need to keep a cache per version. The reason is, we only give one version number and only the latest segment will have to have to compute anything other than the full fingerprint. As soon as a new segment is added everything else other than the full fingerprint becomes useless. So, the solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute everytime. So, the cache should be

          private final Map<LeafReaderContext, IndexFingerprint> perSegmentFingerprintCache = Collections.synchronizedMap(new WeakHashMap<>());
        
        Show
        noble.paul Noble Paul added a comment - - edited Few quick points // Map is not concurrent, but since computeIfAbsent is idempotent, it should be alright for two threads to compute values for the same key. private final Map<LeafReaderContext, Map< Long , IndexFingerprint>> perSegmentFingerprintCache = new WeakHashMap<>(); It's idempotent, but the map has to be thread safe, javadocs say that threadsafety depends on the map implementation We really don't need to keep a cache per version. The reason is, we only give one version number and only the latest segment will have to have to compute anything other than the full fingerprint. As soon as a new segment is added everything else other than the full fingerprint becomes useless. So, the solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute everytime. So, the cache should be private final Map<LeafReaderContext, IndexFingerprint> perSegmentFingerprintCache = Collections.synchronizedMap( new WeakHashMap<>());
        Hide
        praste Pushkar Raste added a comment -

        Discussed with Noble Paul
        We should cache fingerprint for a segment only if maxVersion specified > max version in the segment

        Show
        praste Pushkar Raste added a comment - Discussed with Noble Paul We should cache fingerprint for a segment only if maxVersion specified > max version in the segment
        Show
        noble.paul Noble Paul added a comment - https://github.com/apache/lucene-solr/pull/84
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        A few random points after browsing this issue...

        We can not use current versionsHash (unless we cache all the individual version numbers), as it is not additive.

        The current versionsHash is additive (it must be, because as you say segments may not line up between leader and replica, and document order may differ). When caching per segment, keep this property by simply adding the segment fingerprints together. Am I missing something here?

        private final Map<LeafReaderContext, ...

        LeafReaderContext objects are not reused between changed indexes, so it would not make an effective cache key.
        Use the core cache key, as FieldCache does.

        We should keep in mind that previously written segments can change if there are deletes.

        Right... the core cache key does not change, even if there are deletes for the segment.
        One way to handle this w/o recomputing everything is:
        1) compute the hash w/o regard to deleted docs
        2) compute the hash of deleted docs only
        3) subtract the two values to obtain your hash
        4) keep track of the numDocs in the segment as well... if that doesn't change, no need to recompute. If it does change, then there were additional deletions. Recompute the hash for the deleted docs.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - A few random points after browsing this issue... We can not use current versionsHash (unless we cache all the individual version numbers), as it is not additive. The current versionsHash is additive (it must be, because as you say segments may not line up between leader and replica, and document order may differ). When caching per segment, keep this property by simply adding the segment fingerprints together. Am I missing something here? private final Map<LeafReaderContext, ... LeafReaderContext objects are not reused between changed indexes, so it would not make an effective cache key. Use the core cache key, as FieldCache does. We should keep in mind that previously written segments can change if there are deletes. Right... the core cache key does not change, even if there are deletes for the segment. One way to handle this w/o recomputing everything is: 1) compute the hash w/o regard to deleted docs 2) compute the hash of deleted docs only 3) subtract the two values to obtain your hash 4) keep track of the numDocs in the segment as well... if that doesn't change, no need to recompute. If it does change, then there were additional deletions. Recompute the hash for the deleted docs.
        Hide
        praste Pushkar Raste added a comment -

        Updated patch, added a scenario in PeerSyncTest about replica missing an update.
        Looks like with don't need to remove live docs check if (liveDocs != null && !liveDocs.get(doc)) continue;

        Show
        praste Pushkar Raste added a comment - Updated patch, added a scenario in PeerSyncTest about replica missing an update. Looks like with don't need to remove live docs check if (liveDocs != null && !liveDocs.get(doc)) continue;
        Hide
        praste Pushkar Raste added a comment -

        I also found some weird behavior. If I use parallelStream to compute segment fingerprints in parallel. When I reduce it to the index fingerprint on the index searcher, test fails. Why should order of computation and reduction matter in this case?

        Show
        praste Pushkar Raste added a comment - I also found some weird behavior. If I use parallelStream to compute segment fingerprints in parallel. When I reduce it to the index fingerprint on the index searcher, test fails. Why should order of computation and reduction matter in this case?
        Hide
        praste Pushkar Raste added a comment -

        I computed hash w/o regard to deleted docs and cached it. All the tests are passing even without doing steps #2 and #3. I also verified that index fingerprint computed on entire index matches to that of fingerprint computed on from individual segments (even after deletions).

        Show
        praste Pushkar Raste added a comment - I computed hash w/o regard to deleted docs and cached it. All the tests are passing even without doing steps #2 and #3. I also verified that index fingerprint computed on entire index matches to that of fingerprint computed on from individual segments (even after deletions).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit bb907a2983b4a7eba8cb4d527a859f1b312bdc79 in lucene-solr's branch refs/heads/master from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb907a2 ]

        • SOLR-9506: cache IndexFingerprint for each segment
        Show
        jira-bot ASF subversion and git services added a comment - Commit bb907a2983b4a7eba8cb4d527a859f1b312bdc79 in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb907a2 ] SOLR-9506 : cache IndexFingerprint for each segment
        Hide
        noble.paul Noble Paul added a comment -

        which test. I did not find?

        Show
        noble.paul Noble Paul added a comment - which test. I did not find?
        Hide
        praste Pushkar Raste added a comment -

        I did not upload the patch with parallelStream. In SolrIndexSearcher where we compute and cache per segment indexfingerprint try switching from stream() to parallelStream() and you will see PeerSyncTest fails.

        Show
        praste Pushkar Raste added a comment - I did not upload the patch with parallelStream. In SolrIndexSearcher where we compute and cache per segment indexfingerprint try switching from stream() to parallelStream() and you will see PeerSyncTest fails.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Hmmm, why was this committed?
        See my comments regarding deleted documents that were never addressed. What was committed will now result in incorrect fingerprints being returned.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Hmmm, why was this committed? See my comments regarding deleted documents that were never addressed. What was committed will now result in incorrect fingerprints being returned.
        Hide
        praste Pushkar Raste added a comment -

        I don't see why caching indexfingerprint per segment and using that later would be different than computing indexfingerprint on entire segment by going through one segment at time.

        I tried to come up with scenarios where caching solution would fail and original solution would not, but could not think of any.

        Show
        praste Pushkar Raste added a comment - I don't see why caching indexfingerprint per segment and using that later would be different than computing indexfingerprint on entire segment by going through one segment at time. I tried to come up with scenarios where caching solution would fail and original solution would not, but could not think of any.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment - - edited

        "Right... the core cache key does not change, even if there are deletes for the segment."

        So the cache key ignores deleted documents, while the value being cached does not. It's a fundamental mis-match.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - - edited "Right... the core cache key does not change, even if there are deletes for the segment." So the cache key ignores deleted documents, while the value being cached does not. It's a fundamental mis-match.
        Hide
        praste Pushkar Raste added a comment -

        i.e. we really need fix IndexFingerprint computation, whether or not we cache. I will open a separate issue to fix it in that case.

        Show
        praste Pushkar Raste added a comment - i.e. we really need fix IndexFingerprint computation, whether or not we cache. I will open a separate issue to fix it in that case.
        Hide
        k317h Keith Laban added a comment -

        Are you implying that if you add a document. commit it, compute the index fingerprint and cache the segments. Then delete that document and commit that change, and compute the fingerprint again with the cached segment fingerprint, you will end up with the same index fingerprint?

        Show
        k317h Keith Laban added a comment - Are you implying that if you add a document. commit it, compute the index fingerprint and cache the segments. Then delete that document and commit that change, and compute the fingerprint again with the cached segment fingerprint, you will end up with the same index fingerprint?
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -


        Yep.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Yep.
        Hide
        praste Pushkar Raste added a comment -

        I think what Yonik is implying is that, if for some reason, replica does not apply delete properly, index fingerprint would still checkout and that would be a problem.

        Considering the issues with PeerSync, should add that option recoverWithReplicationOnly ? For most of the setups I doubt if people would have hundreds of thousands of records in updateLog in which which almost no one is using PeerSync anyway

        Show
        praste Pushkar Raste added a comment - I think what Yonik is implying is that, if for some reason, replica does not apply delete properly, index fingerprint would still checkout and that would be a problem. Considering the issues with PeerSync , should add that option recoverWithReplicationOnly ? For most of the setups I doubt if people would have hundreds of thousands of records in updateLog in which which almost no one is using PeerSync anyway
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Pretty simple to try out:

        bin/solr start -e techproducts
        
        http://localhost:8983/solr/techproducts/query?q=*:*
          "response":{"numFound":32,"start":0,"docs":[
        
        http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807
        {
          "fingerprint":{
            "maxVersionSpecified":9223372036854775807,
            "maxVersionEncountered":1548538118066405376,
            "maxInHash":1548538118066405376,
            "versionsHash":8803836617561505377,
            "numVersions":32,
            "numDocs":32,
            "maxDoc":32}}
        
        curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" -d '<delete><id>apple</id></delete>'
        
        # this shows that the delete is visibie
        http://localhost:8983/solr/techproducts/query?q=*:*
          "response":{"numFound":31,"start":0,"docs":[
        
        #fingerprint returns the same thing
        http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807
        {
          "fingerprint":{
            "maxVersionSpecified":9223372036854775807,
            "maxVersionEncountered":1548538118066405376,
            "maxInHash":1548538118066405376,
            "versionsHash":8803836617561505377,
            "numVersions":32,
            "numDocs":32,
            "maxDoc":32}}
        
        bin/solr stop -all
        bin/solr start -e techproducts
        
        #after a restart, fingerprint returns something different
        http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807
        {
          "fingerprint":{
            "maxVersionSpecified":9223372036854775807,
            "maxVersionEncountered":1548538118066405376,
            "maxInHash":1548538118066405376,
            "versionsHash":-1315083666674066080,
            "numVersions":31,
            "numDocs":31,
            "maxDoc":32}}
        
        
        Show
        yseeley@gmail.com Yonik Seeley added a comment - Pretty simple to try out: bin/solr start -e techproducts http: //localhost:8983/solr/techproducts/query?q=*:* "response" :{ "numFound" :32, "start" :0, "docs" :[ http: //localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807 { "fingerprint" :{ "maxVersionSpecified" :9223372036854775807, "maxVersionEncountered" :1548538118066405376, "maxInHash" :1548538118066405376, "versionsHash" :8803836617561505377, "numVersions" :32, "numDocs" :32, "maxDoc" :32}} curl http: //localhost:8983/solr/techproducts/update?commit= true -H "Content-Type: text/xml" -d '<delete><id>apple</id></delete>' # this shows that the delete is visibie http: //localhost:8983/solr/techproducts/query?q=*:* "response" :{ "numFound" :31, "start" :0, "docs" :[ #fingerprint returns the same thing http: //localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807 { "fingerprint" :{ "maxVersionSpecified" :9223372036854775807, "maxVersionEncountered" :1548538118066405376, "maxInHash" :1548538118066405376, "versionsHash" :8803836617561505377, "numVersions" :32, "numDocs" :32, "maxDoc" :32}} bin/solr stop -all bin/solr start -e techproducts #after a restart, fingerprint returns something different http: //localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807 { "fingerprint" :{ "maxVersionSpecified" :9223372036854775807, "maxVersionEncountered" :1548538118066405376, "maxInHash" :1548538118066405376, "versionsHash" :-1315083666674066080, "numVersions" :31, "numDocs" :31, "maxDoc" :32}}
        Hide
        praste Pushkar Raste added a comment -

        There is lot of confusion going on here. Would above test fail not fail, if we won't cache per segment indexfingerprint ?
        If yes, them we should revert the commit, if not we should open a new issue to fix the indexfingerprint computation altogether.

        Show
        praste Pushkar Raste added a comment - There is lot of confusion going on here. Would above test fail not fail, if we won't cache per segment indexfingerprint ? If yes, them we should revert the commit, if not we should open a new issue to fix the indexfingerprint computation altogether.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Not sure I understand... are you suggesting a workaround in PeerSync (recoverWithReplicationOnly) to work around the correctness problem caused by this commit?

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Not sure I understand... are you suggesting a workaround in PeerSync (recoverWithReplicationOnly) to work around the correctness problem caused by this commit?
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -


        The above manual test only exhibited this bad behavior after the commit today.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - The above manual test only exhibited this bad behavior after the commit today.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 9aa764a54f50eca5a8ef805bdb29e4ad90fcce5e in lucene-solr's branch refs/heads/master from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9aa764a ]

        • SOLR-9506: cache IndexFingerprint for each segment
        Show
        jira-bot ASF subversion and git services added a comment - Commit 9aa764a54f50eca5a8ef805bdb29e4ad90fcce5e in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9aa764a ] SOLR-9506 : cache IndexFingerprint for each segment
        Hide
        noble.paul Noble Paul added a comment -

        If the above case fails, let's revert the commit and revisit the fingerprint computation

        Show
        noble.paul Noble Paul added a comment - If the above case fails, let's revert the commit and revisit the fingerprint computation
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Please do.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Please do.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ffa5c4ba2c2d6fa6bb943a70196aad0058333fa2 in lucene-solr's branch refs/heads/master from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ffa5c4b ]

        SOLR-9506: reverting the previous commit

        Show
        jira-bot ASF subversion and git services added a comment - Commit ffa5c4ba2c2d6fa6bb943a70196aad0058333fa2 in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ffa5c4b ] SOLR-9506 : reverting the previous commit
        Hide
        k317h Keith Laban added a comment -

        How expensive would it be to check numDocs (#4 in yoniks comment earlier). I think this would be the most straightforward and understandable approach.

        Show
        k317h Keith Laban added a comment - How expensive would it be to check numDocs (#4 in yoniks comment earlier). I think this would be the most straightforward and understandable approach.
        Hide
        praste Pushkar Raste added a comment -

        Noble Paul and Yonik Seeley I was able to put together test to show that current implementation is broken.
        I will update patch with the test and a fix by EOD today

        Show
        praste Pushkar Raste added a comment - Noble Paul and Yonik Seeley I was able to put together test to show that current implementation is broken. I will update patch with the test and a fix by EOD today
        Hide
        praste Pushkar Raste added a comment -

        Patch with parallalized computation

        Show
        praste Pushkar Raste added a comment - Patch with parallalized computation
        Hide
        praste Pushkar Raste added a comment -

        Don't use patch for parallalized computation. Parallel streams in use a shared fork-join pool. A bad actor can create havoc.

        Show
        praste Pushkar Raste added a comment - Don't use patch for parallalized computation. Parallel streams in use a shared fork-join pool. A bad actor can create havoc.
        Hide
        praste Pushkar Raste added a comment -

        Removed maxVersionFingerprintCache from SolrIndexSearcher

        Show
        praste Pushkar Raste added a comment - Removed maxVersionFingerprintCache from SolrIndexSearcher
        Hide
        dsmiley David Smiley added a comment -

        Parallel streams in use a shared fork-join pool. A bad actor can create havoc.

        Interesting. FWIW I found this solution: http://stackoverflow.com/a/22269778/92186 – create a custom ForkPointPool that you execute within. Definitely non-obvious.

        Show
        dsmiley David Smiley added a comment - Parallel streams in use a shared fork-join pool. A bad actor can create havoc. Interesting. FWIW I found this solution: http://stackoverflow.com/a/22269778/92186 – create a custom ForkPointPool that you execute within. Definitely non-obvious.
        Hide
        praste Pushkar Raste added a comment -

        Yeah, I looked into it. I will try that approach, if I can get to it before Noble Paul applies the patch.

        Show
        praste Pushkar Raste added a comment - Yeah, I looked into it. I will try that approach, if I can get to it before Noble Paul applies the patch.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 184b0f221559eaed5f273b1907e8af07bc95fec9 in lucene-solr's branch refs/heads/master from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=184b0f2 ]

        SOLR-9506: cache IndexFingerprint for each segment

        Show
        jira-bot ASF subversion and git services added a comment - Commit 184b0f221559eaed5f273b1907e8af07bc95fec9 in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=184b0f2 ] SOLR-9506 : cache IndexFingerprint for each segment
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 265d425b00181dd384fa963e46dc35b92b7e02c0 in lucene-solr's branch refs/heads/branch_6x from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=265d425 ]

        SOLR-9506: cache IndexFingerprint for each segment

        Show
        jira-bot ASF subversion and git services added a comment - Commit 265d425b00181dd384fa963e46dc35b92b7e02c0 in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=265d425 ] SOLR-9506 : cache IndexFingerprint for each segment
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        While working on SOLR-5944, I realized that the current per segment caching logic works fine for deleted documents (due to comparison of numDocs in a segment for the criterion of cache hit/miss). However, if a segment has docValues updates, the same logic is insufficient. It is my understanding that changing the key for caching from reader().getCoreCacheKey() to reader().getCombinedCoreAndDeletesKey() would work here, since the docValues updates are internally handled using deletion queue and hence the "combined" core and deletes key would work here. Attaching a patch for the same.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - While working on SOLR-5944 , I realized that the current per segment caching logic works fine for deleted documents (due to comparison of numDocs in a segment for the criterion of cache hit/miss). However, if a segment has docValues updates, the same logic is insufficient. It is my understanding that changing the key for caching from reader().getCoreCacheKey() to reader().getCombinedCoreAndDeletesKey() would work here, since the docValues updates are internally handled using deletion queue and hence the "combined" core and deletes key would work here. Attaching a patch for the same.
        Hide
        noble.paul Noble Paul added a comment -

        Ishan , i guess this is already fixed in 6.3. so, we may need to open another ticket

        Show
        noble.paul Noble Paul added a comment - Ishan , i guess this is already fixed in 6.3. so, we may need to open another ticket
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        I see.. I saw it was unresolved, and I thought it didn't make it into 6.3 yet. I'll see if it made it into 6.3, and open a new ticket if that's the case.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - I see.. I saw it was unresolved, and I thought it didn't make it into 6.3 yet. I'll see if it made it into 6.3, and open a new ticket if that's the case.
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

        Can we resolve this issue, since it seems it was released as part of 6.3.0? (I will open another issue for the issue I wrote about two comments before Added SOLR-9777).

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited Can we resolve this issue, since it seems it was released as part of 6.3.0? ( I will open another issue for the issue I wrote about two comments before Added SOLR-9777 ).

          People

          • Assignee:
            noble.paul Noble Paul
            Reporter:
            noble.paul Noble Paul
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development