Solr
  1. Solr
  2. SOLR-3428

SolrCmdDistributor flushAdds/flushDeletes problems

    Details

      Description

      A few problems with SolrCmdDistributor.flushAdds/flushDeletes

      • If number of AddRequests/DeleteRequests in alist/dlist is below limit for a specific node the method returns immediately and doesnt flush for subsequent nodes
      • When returning immediately because there is below limit requests for a given node, then previous nodes that have already been flushed/submitted are not removed from adds/deletes maps (causing them to be flushed/submitted again the next time flushAdds/flushDeletes is executed)
      • The idea about just combining params does not work for SEEN_LEADER params (and probably others as well). Since SEEN_LEADER cannot be expressed (unlike commitWithin and overwrite) for individual operations in the request, you need to sent two separate submits. One containing requests with SEEN_LEADER=true and one with SEEN_LEADER=false.

        Activity

        Hide
        Per Steffensen added a comment -

        Primitive patch comming up...

        Show
        Per Steffensen added a comment - Primitive patch comming up...
        Hide
        Per Steffensen added a comment -

        Patch available as part of SOLR-3173_3178_3382_3428_plus.patch on SOLR-3178 - basically as most of the changes in SolrCmdDistributor.

        Show
        Per Steffensen added a comment - Patch available as part of SOLR-3173 _3178_3382_3428_plus.patch on SOLR-3178 - basically as most of the changes in SolrCmdDistributor.
        Hide
        Mark Miller added a comment -

        Any chance you could break out a test case and fix for this from that? I'd like to see this fixed sooner rather than later, and that patch is fairly large with a lot of unrelated changes in the other issue.

        Show
        Mark Miller added a comment - Any chance you could break out a test case and fix for this from that? I'd like to see this fixed sooner rather than later, and that patch is fairly large with a lot of unrelated changes in the other issue.
        Hide
        Per Steffensen added a comment -

        Well first of all ClassicConsistencyHybridUpdateSemanticsSolrCloudTest (in the patch) is a test that protects the fix from being broken, because it depends on the issues mentioned in the description above being fixed. So if you commit the entire patch attached to SOLR-3178 you will have the fix for the issues AND a good test (even though its core focus in not the issues in this SOLR-3428) protecting if from being broken again by accident in the future.

        One reason that you probably never found the issues before now, is that it is really hard to detect that requests are forwarded (from non-leader to leader or from leader to replica) multiple times or not at all, if requests always result in idempotent-that-cant-fail-operations only is carried out on server-side. And that has always been the case for update requests until I introduced versioning and unique key constraint errors. Using those features update requests are no longer idempotent and fail-free - an update that succeeds nicely the first time it is carried out will not succeed if it is carried out a second time (DocumentAlreadyExists-, DocumentDoesNotExist- or VersionConflict-error will occur). Because of the fact that updates are idempotent and fail-free, and because we have not finegrained (per document) and typed error propagation (before my patch), it will be kinda hard to construct a (not too artificial, with a lot of mocking in order to simulate unrealistic behaviour) test showing those issues mentioned in the description above (and protecting the fix from being broken in the future). So I guess I would rather not try.

        Instead I really encourage you to have that patch committed. It provides well done features and fixes. I know it is a big patch covering many issues and I have learned for the future to try not to bring so big patches, but basically all the issues are kinda related - versioning and the ability to fail on unique key constraint violations (SOLR-3173 and SOLR-3178) is really not worth much without the ability to report the typed errors back to the client per document (SOLR-3382) and the bugs in this SOLR-3428 is kinda harmless and therefore hard to provoke in a test as long as everything behind an update request is idempotent and fail-free (which it is before SOLR-3173, SOLR-3178 and SOLR-3382). I really think it is worth it that one or two of you committers spent a day or two understanding the features and fixes in the patch - it will save you much more than the one or two days later, if you have to make the features and fix the bugs yourself.

        So bacially I think it would be a waste of (a lot) of time trying to create a poor (because it will have to be very artificial and unrealistic) test for the issues in this SOLR-3428. IMHO you should put your effort into doing whatever it takes to get the big patch committed.

        Regards, Per Steffensen

        Show
        Per Steffensen added a comment - Well first of all ClassicConsistencyHybridUpdateSemanticsSolrCloudTest (in the patch) is a test that protects the fix from being broken, because it depends on the issues mentioned in the description above being fixed. So if you commit the entire patch attached to SOLR-3178 you will have the fix for the issues AND a good test (even though its core focus in not the issues in this SOLR-3428 ) protecting if from being broken again by accident in the future. One reason that you probably never found the issues before now, is that it is really hard to detect that requests are forwarded (from non-leader to leader or from leader to replica) multiple times or not at all, if requests always result in idempotent-that-cant-fail-operations only is carried out on server-side. And that has always been the case for update requests until I introduced versioning and unique key constraint errors. Using those features update requests are no longer idempotent and fail-free - an update that succeeds nicely the first time it is carried out will not succeed if it is carried out a second time (DocumentAlreadyExists-, DocumentDoesNotExist- or VersionConflict-error will occur). Because of the fact that updates are idempotent and fail-free, and because we have not finegrained (per document) and typed error propagation (before my patch), it will be kinda hard to construct a (not too artificial, with a lot of mocking in order to simulate unrealistic behaviour) test showing those issues mentioned in the description above (and protecting the fix from being broken in the future). So I guess I would rather not try. Instead I really encourage you to have that patch committed. It provides well done features and fixes. I know it is a big patch covering many issues and I have learned for the future to try not to bring so big patches, but basically all the issues are kinda related - versioning and the ability to fail on unique key constraint violations ( SOLR-3173 and SOLR-3178 ) is really not worth much without the ability to report the typed errors back to the client per document ( SOLR-3382 ) and the bugs in this SOLR-3428 is kinda harmless and therefore hard to provoke in a test as long as everything behind an update request is idempotent and fail-free (which it is before SOLR-3173 , SOLR-3178 and SOLR-3382 ). I really think it is worth it that one or two of you committers spent a day or two understanding the features and fixes in the patch - it will save you much more than the one or two days later, if you have to make the features and fix the bugs yourself. So bacially I think it would be a waste of (a lot) of time trying to create a poor (because it will have to be very artificial and unrealistic) test for the issues in this SOLR-3428 . IMHO you should put your effort into doing whatever it takes to get the big patch committed. Regards, Per Steffensen
        Hide
        Mark Miller added a comment -

        I've committed the simple fix for the flush issue and added a test.

        Show
        Mark Miller added a comment - I've committed the simple fix for the flush issue and added a test.
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Robert Muir added a comment -

        moving all 4.0 issues not touched in a month to 4.1

        Show
        Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
        Hide
        Mark Miller added a comment -

        This can likely be closed - at some point I fixed 1 and 2 (adding Per to the CHANGES entries) and added targeted tests for them and I'm not yet convinced that 3 is a problem (which doesn't guarantee it's not - but I have not found a situation where it is a problem).

        Show
        Mark Miller added a comment - This can likely be closed - at some point I fixed 1 and 2 (adding Per to the CHANGES entries) and added targeted tests for them and I'm not yet convinced that 3 is a problem (which doesn't guarantee it's not - but I have not found a situation where it is a problem).

          People

          • Assignee:
            Per Steffensen
            Reporter:
            Per Steffensen
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development