Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8728

Splitting a shard of a collection created with a rule fails with NPE

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5.1, 6.0, 6.1, 7.0
    • Component/s: None
    • Labels:
      None

      Description

      Spinoff from this discussion: http://markmail.org/message/f7liw4hqaagxo7y2

      I wrote a short test which reproduces, will upload shortly.

      1. SOLR-8728.patch
        10 kB
        Noble Paul
      2. SOLR-8728.patch
        2 kB
        Shai Erera

        Issue Links

          Activity

          Hide
          shaie Shai Erera added a comment -

          If you run this test it fails and you can see this exception in the console:

          42071 ERROR (OverseerThreadFactory-6-thread-2-processing-n:127.0.0.1:49954_yd_dma%2Fkz) [n:127.0.0.1:49954_yd_dma%2Fkz    ] o.a.s.c.OverseerCollectionMessageHandler Error executing split operation for collection: shardSplitWithRule parent shard: shard1
          java.lang.NullPointerException
          	at org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:166)
          	at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:128)
          	at org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:249)
          	at org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:201)
          	at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:173)
          	at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:134)
          	at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:215)
          	at org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:178)
          	at org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2164)
          	at org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1388)
          	at org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:236)
          	at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:433)
          	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          	at java.lang.Thread.run(Thread.java:745)
          
          Show
          shaie Shai Erera added a comment - If you run this test it fails and you can see this exception in the console: 42071 ERROR (OverseerThreadFactory-6-thread-2-processing-n:127.0.0.1:49954_yd_dma%2Fkz) [n:127.0.0.1:49954_yd_dma%2Fkz ] o.a.s.c.OverseerCollectionMessageHandler Error executing split operation for collection: shardSplitWithRule parent shard: shard1 java.lang.NullPointerException at org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:166) at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:128) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:249) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:201) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:173) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:134) at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:215) at org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:178) at org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2164) at org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1388) at org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:236) at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:433) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
          Hide
          noble.paul Noble Paul added a comment -

          Thanks a lot Shai Erera I shall fix this right away

          Show
          noble.paul Noble Paul added a comment - Thanks a lot Shai Erera I shall fix this right away
          Hide
          noble.paul Noble Paul added a comment -

          summary of changes:

          • ReplicaAssigner should only lookup tags for nodes participating in selection. This is to avoid the NPE.
          • splitShard() preassigns nodes and the assignment is done without using the ReplicaAssigner .This means , rules don't kick in when nodes are assigned in splitshard. In this patch, the preassigning is done using the rules , if they exist

          Shalin Shekhar Mangar I would like you to take a look at the changes made to splitShard()

          Show
          noble.paul Noble Paul added a comment - summary of changes: ReplicaAssigner should only lookup tags for nodes participating in selection. This is to avoid the NPE. splitShard() preassigns nodes and the assignment is done without using the ReplicaAssigner .This means , rules don't kick in when nodes are assigned in splitshard. In this patch, the preassigning is done using the rules , if they exist Shalin Shekhar Mangar I would like you to take a look at the changes made to splitShard()
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Thanks Noble.

          +      List<String> subSliceNames =  new ArrayList<>();
          +      for (int i = 0; i < subSlices.size(); i++) subSliceNames.add(slice + "_" + i);
          

          This seems redundant because the "subSlices" list already has all the sub-slice names.

          The rest looks good!

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Thanks Noble. + List< String > subSliceNames = new ArrayList<>(); + for ( int i = 0; i < subSlices.size(); i++) subSliceNames.add(slice + "_" + i); This seems redundant because the "subSlices" list already has all the sub-slice names. The rest looks good!
          Hide
          noble.paul Noble Paul added a comment -

          thanks

          Show
          noble.paul Noble Paul added a comment - thanks
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0cd24c5d08678a4cc883381d54089f62d0978b4d in lucene-solr's branch refs/heads/master from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0cd24c5 ]

          SOLR-8728: ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica
          placement. splitshard should preassign nodes using rules, if rules are present

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0cd24c5d08678a4cc883381d54089f62d0978b4d in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0cd24c5 ] SOLR-8728 : ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica placement. splitshard should preassign nodes using rules, if rules are present
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 201b8b02a47ccbc7d08222e11b0a3d54f63ce90f in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=201b8b0 ]

          SOLR-8728: ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica
          placement. splitshard should preassign nodes using rules, if rules are present

          Show
          jira-bot ASF subversion and git services added a comment - Commit 201b8b02a47ccbc7d08222e11b0a3d54f63ce90f in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=201b8b0 ] SOLR-8728 : ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica placement. splitshard should preassign nodes using rules, if rules are present
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 89524d917adc02018f277df56553546fd11fdf77 in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=89524d9 ]

          SOLR-8728: ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica
          placement. splitshard should preassign nodes using rules, if rules are present

          Show
          jira-bot ASF subversion and git services added a comment - Commit 89524d917adc02018f277df56553546fd11fdf77 in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=89524d9 ] SOLR-8728 : ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica placement. splitshard should preassign nodes using rules, if rules are present
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b8183c808083fa206fbf5a0bec76807496fec162 in lucene-solr's branch refs/heads/branch_6_0 from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b8183c8 ]

          SOLR-8728: ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica
          placement. splitshard should preassign nodes using rules, if rules are present

          Show
          jira-bot ASF subversion and git services added a comment - Commit b8183c808083fa206fbf5a0bec76807496fec162 in lucene-solr's branch refs/heads/branch_6_0 from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b8183c8 ] SOLR-8728 : ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica placement. splitshard should preassign nodes using rules, if rules are present
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d0de8cf8733662999e0dcd08cb07445b35632a9d in lucene-solr's branch refs/heads/branch_6_0 from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d0de8cf ]

          SOLR-8728: ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica
          placement. splitshard should preassign nodes using rules, if rules are present

          Show
          jira-bot ASF subversion and git services added a comment - Commit d0de8cf8733662999e0dcd08cb07445b35632a9d in lucene-solr's branch refs/heads/branch_6_0 from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d0de8cf ] SOLR-8728 : ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica placement. splitshard should preassign nodes using rules, if rules are present
          Show
          noble.paul Noble Paul added a comment - test failure caused by this fix http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/15/testReport/org.apache.solr.cloud/CollectionTooManyReplicasTest/testDownedShards/
          Hide
          shaie Shai Erera added a comment -

          This is marked as fixed in 6.0, but should it also be marked for 6.1(since it's also committed to 6x)?

          What about master – was it not committed to master too? Does it not affect master?

          And lastly, in case we will have a 5.5.1, is this considered a bugfix that we'll want to backport?

          Show
          shaie Shai Erera added a comment - This is marked as fixed in 6.0, but should it also be marked for 6.1(since it's also committed to 6x)? What about master – was it not committed to master too? Does it not affect master? And lastly, in case we will have a 5.5.1, is this considered a bugfix that we'll want to backport?
          Hide
          noble.paul Noble Paul added a comment -

          It's fixed on all three. So, isn't it implicit that stuff fixed in 6.0 is fixed in all releases that come after 6.0?

          5.5.1 makes sense because it is a release prior to 6.0

          Show
          noble.paul Noble Paul added a comment - It's fixed on all three. So, isn't it implicit that stuff fixed in 6.0 is fixed in all releases that come after 6.0? 5.5.1 makes sense because it is a release prior to 6.0
          Hide
          shaie Shai Erera added a comment -

          We usually set the fix version to be e.g. "5.5" and "trunk/master".

          Cause there are issues that are fixed only in a specific version, e.g. if they only affect that version.

          Show
          shaie Shai Erera added a comment - We usually set the fix version to be e.g. "5.5" and "trunk/master". Cause there are issues that are fixed only in a specific version, e.g. if they only affect that version.
          Hide
          noble.paul Noble Paul added a comment -

          I guess this should go to the 5.5 branch as well right away. Just in case there is a 5.5.1

          Show
          noble.paul Noble Paul added a comment - I guess this should go to the 5.5 branch as well right away. Just in case there is a 5.5.1
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c17f80a7a22a4215ebcfe08939b6c0acc30df7c2 in lucene-solr's branch refs/heads/branch_5_5 from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c17f80a ]

          SOLR-8728

          Show
          jira-bot ASF subversion and git services added a comment - Commit c17f80a7a22a4215ebcfe08939b6c0acc30df7c2 in lucene-solr's branch refs/heads/branch_5_5 from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c17f80a ] SOLR-8728
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4faadb625e91b8ef323724855748258570137084 in lucene-solr's branch refs/heads/branch_5_5 from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4faadb6 ]

          SOLR-8728: ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica
          placement. splitshard should preassign nodes using rules, if rules are present

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4faadb625e91b8ef323724855748258570137084 in lucene-solr's branch refs/heads/branch_5_5 from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4faadb6 ] SOLR-8728 : ReplicaAssigner throws NPE when a partial list of nodes are only participating in replica placement. splitshard should preassign nodes using rules, if rules are present
          Hide
          anshumg Anshum Gupta added a comment -

          Noble, this was committed to branch_5_5 but seems like you missed the change log entry.

          Show
          anshumg Anshum Gupta added a comment - Noble, this was committed to branch_5_5 but seems like you missed the change log entry.
          Hide
          anshumg Anshum Gupta added a comment -

          Reopening to add the change log entry to branch_5_5.

          Show
          anshumg Anshum Gupta added a comment - Reopening to add the change log entry to branch_5_5.
          Hide
          anshumg Anshum Gupta added a comment -

          branch_5x

          commit 07f9cf8aee3523a22c92923f9d4e46a297efc455
          Author: anshum <anshum@apache.org>
          Date:   Thu Apr 21 15:59:25 2016 -0700
          
              SOLR-8728: Add missing change log entry for 5.5.1
          

          branch_5_5

          commit 5601f839c5001b1c2cce44b3b6349b1c1de23230
          Author: anshum <anshum@apache.org>
          Date:   Thu Apr 21 15:59:25 2016 -0700
          
              SOLR-8728: Add missing change log entry for 5.5.1
          
          Show
          anshumg Anshum Gupta added a comment - branch_5x commit 07f9cf8aee3523a22c92923f9d4e46a297efc455 Author: anshum <anshum@apache.org> Date: Thu Apr 21 15:59:25 2016 -0700 SOLR-8728: Add missing change log entry for 5.5.1 branch_5_5 commit 5601f839c5001b1c2cce44b3b6349b1c1de23230 Author: anshum <anshum@apache.org> Date: Thu Apr 21 15:59:25 2016 -0700 SOLR-8728: Add missing change log entry for 5.5.1
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S5 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

            People

            • Assignee:
              noble.paul Noble Paul
              Reporter:
              shaie Shai Erera
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development