Solr
  1. Solr
  2. SOLR-7384

Delete-by-id with _route_ parameter fails on replicas for collections with implicit router

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 5.1
    • Fix Version/s: 5.2, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      The FullSolrCloudDistribCmdsTest test has been failing quite regularly on jenkins. Some of those failures are spurious but there is an underlying bug that delete-by-id requests with "route" parameter on a collection with implicit router, fails on replicas because of a missing "version" field.

      Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12286/
      Java: 32bit/jdk1.9.0-ea-b54 -server -XX:+UseConcMarkSweepGC

      1 tests failed.
      FAILED: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test

      Error Message:
      Error from server at http://127.0.0.1:44672/implicit_collection_without_routerfield_shard1_replica1: no servers hosting shard:

      Stack Trace:
      org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:44672/implicit_collection_without_routerfield_shard1_replica1: no servers hosting shard:
      at __randomizedtesting.SeedInfo.seed([944EEE25A6B2D153:1C1AD1FF084EBCAB]:0)
      at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:557)
      at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
      at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
      at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)

      1. FullSolrCloudDistribCmdsTest.log
        1.56 MB
        Shalin Shekhar Mangar
      2. FullSolrCloudDistribCmdsTest-2.log
        1.65 MB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Show
          Shalin Shekhar Mangar added a comment - Full plain text log from http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12286/consoleText
          Hide
          ASF subversion and git services added a comment -

          Commit 1673176 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1673176 ]

          SOLR-7384: Fix spurious failures in FullSolrCloudDistribCmdsTest

          Show
          ASF subversion and git services added a comment - Commit 1673176 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1673176 ] SOLR-7384 : Fix spurious failures in FullSolrCloudDistribCmdsTest
          Hide
          ASF subversion and git services added a comment -

          Commit 1673177 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1673177 ]

          SOLR-7384: Fix spurious failures in FullSolrCloudDistribCmdsTest

          Show
          ASF subversion and git services added a comment - Commit 1673177 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673177 ] SOLR-7384 : Fix spurious failures in FullSolrCloudDistribCmdsTest
          Hide
          Shalin Shekhar Mangar added a comment -

          The test was creating collections and proceeding to make queries without ensuring that the nodes had recovered. This was causing the failure where a search request failed because it could find no 'active' and live shard. I added a waitForRecoveriesToFinish in two places in this test so that this situation is avoided. Longer term, we need to refactor our test framework code so that this is done automatically.

          Show
          Shalin Shekhar Mangar added a comment - The test was creating collections and proceeding to make queries without ensuring that the nodes had recovered. This was causing the failure where a search request failed because it could find no 'active' and live shard. I added a waitForRecoveriesToFinish in two places in this test so that this situation is avoided. Longer term, we need to refactor our test framework code so that this is done automatically.
          Hide
          Shalin Shekhar Mangar added a comment -

          There are more failures for this test:

          FAILED:  org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test
          
          Error Message:
          expected:<1> but was:<2>
          
          Stack Trace:
          java.lang.AssertionError: expected:<1> but was:<2>
                  at __randomizedtesting.SeedInfo.seed([60FAB0B863999B0A:E8AE8F62CD65F6F2]:0)
                  at org.junit.Assert.fail(Assert.java:93)
                  at org.junit.Assert.failNotEquals(Assert.java:647)
                  at org.junit.Assert.assertEquals(Assert.java:128)
                  at org.junit.Assert.assertEquals(Assert.java:472)
                  at org.junit.Assert.assertEquals(Assert.java:456)
                  at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.testDeleteByIdImplicitRouter(FullSolrCloudDistribCmdsTest.java:247)
                  at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test(FullSolrCloudDistribCmdsTest.java:144)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          

          Still digging.

          Show
          Shalin Shekhar Mangar added a comment - There are more failures for this test: FAILED: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test Error Message: expected:<1> but was:<2> Stack Trace: java.lang.AssertionError: expected:<1> but was:<2> at __randomizedtesting.SeedInfo.seed([60FAB0B863999B0A:E8AE8F62CD65F6F2]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.testDeleteByIdImplicitRouter(FullSolrCloudDistribCmdsTest.java:247) at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test(FullSolrCloudDistribCmdsTest.java:144) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Still digging.
          Hide
          Shalin Shekhar Mangar added a comment -

          Attaching log for the failure noted in my previous comment.

          Show
          Shalin Shekhar Mangar added a comment - Attaching log for the failure noted in my previous comment.
          Hide
          Shalin Shekhar Mangar added a comment -

          It looks like the leader is distributing an update without a version field.

             [junit4]   2> 332853 T2394 N:127.0.0.1:45875_uhlt%2Fw C:implicit_collection_without_routerfield S:shard2 R:core_node4 c:implicit_collection_without_routerfield_shard2_replica1 C323 oasup.LogUpdateProcessor.finish [implicit_collection_without_routerfield_shard2_replica1] webapp=/uhlt/w path=/update params={update.distrib=FROMLEADER&distrib.from=http://127.0.0.1:42537/uhlt/w/implicit_collection_without_routerfield_shard2_replica2/&wt=javabin&version=2} {} 0 1
             [junit4]   2> 332853 T2394 N:127.0.0.1:45875_uhlt%2Fw C:implicit_collection_without_routerfield S:shard2 R:core_node4 c:implicit_collection_without_routerfield_shard2_replica1 C323 oasc.SolrException.log ERROR org.apache.solr.common.SolrException: missing _version_ on update from leader
             [junit4]   2> 		at org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1508)
             [junit4]   2> 		at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:1161)
             [junit4]   2> 		at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:125)
             [junit4]   2> 		at org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:148)
             [junit4]   2> 		at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:111)
             [junit4]   2> 		at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
             [junit4]   2> 		at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
          
          Show
          Shalin Shekhar Mangar added a comment - It looks like the leader is distributing an update without a version field. [junit4] 2> 332853 T2394 N:127.0.0.1:45875_uhlt%2Fw C:implicit_collection_without_routerfield S:shard2 R:core_node4 c:implicit_collection_without_routerfield_shard2_replica1 C323 oasup.LogUpdateProcessor.finish [implicit_collection_without_routerfield_shard2_replica1] webapp=/uhlt/w path=/update params={update.distrib=FROMLEADER&distrib.from=http: //127.0.0.1:42537/uhlt/w/implicit_collection_without_routerfield_shard2_replica2/&wt=javabin&version=2} {} 0 1 [junit4] 2> 332853 T2394 N:127.0.0.1:45875_uhlt%2Fw C:implicit_collection_without_routerfield S:shard2 R:core_node4 c:implicit_collection_without_routerfield_shard2_replica1 C323 oasc.SolrException.log ERROR org.apache.solr.common.SolrException: missing _version_ on update from leader [junit4] 2> at org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1508) [junit4] 2> at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:1161) [junit4] 2> at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:125) [junit4] 2> at org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:148) [junit4] 2> at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:111) [junit4] 2> at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) [junit4] 2> at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
          Hide
          Shalin Shekhar Mangar added a comment -

          Interestingly, I always see the above exception in the logs even when the test passes. I think the changes made in SOLR-5890 are not correct and the deleteById with a route is sent to the leader but the leader replicates it without assigning a version field. Since the update still succeeds on the leader, the test passes when the subsequent search query hits the leader but it fails when it happens to hit the replica directly.

          The second bug is why the leader even attempts to send such an update (i.e. without a version). We should assert such things and make sure that they are not possible at all.

          Show
          Shalin Shekhar Mangar added a comment - Interestingly, I always see the above exception in the logs even when the test passes. I think the changes made in SOLR-5890 are not correct and the deleteById with a route is sent to the leader but the leader replicates it without assigning a version field. Since the update still succeeds on the leader, the test passes when the subsequent search query hits the leader but it fails when it happens to hit the replica directly. The second bug is why the leader even attempts to send such an update (i.e. without a version ). We should assert such things and make sure that they are not possible at all.
          Hide
          ASF subversion and git services added a comment -

          Commit 1673262 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1673262 ]

          SOLR-7384: Disable the failing tests until the root cause is fixed

          Show
          ASF subversion and git services added a comment - Commit 1673262 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1673262 ] SOLR-7384 : Disable the failing tests until the root cause is fixed
          Hide
          ASF subversion and git services added a comment -

          Commit 1673263 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1673263 ]

          SOLR-7384: Disable the failing tests until the root cause is fixed

          Show
          ASF subversion and git services added a comment - Commit 1673263 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673263 ] SOLR-7384 : Disable the failing tests until the root cause is fixed

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Shalin Shekhar Mangar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development