Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9503

NPE in Replica Placement Rules when using Overseer Role with other rules

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 6.2, 7.0
    • Fix Version/s: None
    • Component/s: Rules, SolrCloud
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None
    • Flags:
      Patch

      Description

      The overseer role introduced in SOLR-9251 works well if there's only a single Rule for replica placement e.g.

      rule=role:!overseer

      but when combined with another rule, e.g.

      rule=role:!overseer&rule=host:*,shard:*,replica:<2

      it can result in a NullPointerException (in Rule.tryAssignNodeToShard)

      This happens because the code builds up a nodeVsTags map, but it only has entries for nodes that have values for all tags used among the rules. This means not enough information is available to other rules when they are being checked during replica assignment. In the example rules above, if we have a cluster of 12 nodes and only 3 are given the Overseer role, the others do not have any entry in the nodeVsTags map because they only have the host tag value and not the role tag value.

      Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only keeping entries that fulfil the constraint of having values for all tags used in the rules. Possibly this constraint was suitable when rules were originally introduced, but the Role tag (used for Overseers) is unlikely to be present for all nodes in the cluster, and similarly for sysprop tags which may or not be set for a node.

      My patch removes this constraint, so the nodeVsTags map contains everything known about all nodes, even if they have no value for a given tag. This allows the rule combination above to work, and doesn't appear to cause any problems with the code paths that use the nodeVsTags map. They handle null values quite well, and the tests pass.

      1. SOLR-9503.patch
        4 kB
        Tim Owen
      2. SOLR-9503.patch
        3 kB
        Tim Owen

        Activity

        Hide
        TimOwen Tim Owen added a comment -

        As an aside, I noticed that `Rule.Operand.GREATER_THAN` seems to be missing an override for `public int compare(Object n1Val, Object n2Val)` .. but compare only appears to be used when sorting the live nodes, so maybe it's not a big deal?

        Show
        TimOwen Tim Owen added a comment - As an aside, I noticed that `Rule.Operand.GREATER_THAN` seems to be missing an override for `public int compare(Object n1Val, Object n2Val)` .. but compare only appears to be used when sorting the live nodes, so maybe it's not a big deal?
        Hide
        TimOwen Tim Owen added a comment -

        Is anyone able to take a look at this fix - maybe Noble Paul? I hope the assumptions I've made in the diff are correct.

        We've been using it in production for a few months, in our custom build of Solr. Would be nice to roll it in upstream.

        Show
        TimOwen Tim Owen added a comment - Is anyone able to take a look at this fix - maybe Noble Paul ? I hope the assumptions I've made in the diff are correct. We've been using it in production for a few months, in our custom build of Solr. Would be nice to roll it in upstream.
        Hide
        noble.paul Noble Paul added a comment -

        I missed it tim. I shall take a look at it tomorrow

        Show
        noble.paul Noble Paul added a comment - I missed it tim. I shall take a look at it tomorrow
        Hide
        noble.paul Noble Paul added a comment -

        Tim Owen The fix looks fine. I could commit it if we could add a JUnit

        Show
        noble.paul Noble Paul added a comment - Tim Owen The fix looks fine. I could commit it if we could add a JUnit
        Hide
        TimOwen Tim Owen added a comment -

        I went through the tests and found that if I added another rule to the existing test for the overseer-role, it would fail as expected with the previous code. That test now passes with the fix, so I've updated my patch with that test change.

        Show
        TimOwen Tim Owen added a comment - I went through the tests and found that if I added another rule to the existing test for the overseer-role, it would fail as expected with the previous code. That test now passes with the fix, so I've updated my patch with that test change.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit cd4f908d5ba223e615920be73285b7c5cc57704a in lucene-solr's branch refs/heads/master from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cd4f908 ]

        SOLR-9503: NPE in Replica Placement Rules when using Overseer Role with other rules

        Show
        jira-bot ASF subversion and git services added a comment - Commit cd4f908d5ba223e615920be73285b7c5cc57704a in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cd4f908 ] SOLR-9503 : NPE in Replica Placement Rules when using Overseer Role with other rules
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 2b66d0cb127b5e3e92a0f988aa7ba10690227ac3 in lucene-solr's branch refs/heads/branch_6x from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b66d0c ]

        SOLR-9503: NPE in Replica Placement Rules when using Overseer Role with other rules

        Show
        jira-bot ASF subversion and git services added a comment - Commit 2b66d0cb127b5e3e92a0f988aa7ba10690227ac3 in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b66d0c ] SOLR-9503 : NPE in Replica Placement Rules when using Overseer Role with other rules

          People

          • Assignee:
            noble.paul Noble Paul
            Reporter:
            TimOwen Tim Owen
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development