Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9751

PreAnalyzedField can cause managed schema corruption

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.2, 6.3
    • Fix Version/s: 6.4, 7.0
    • Component/s: Schema and Analysis
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      The exception as follows:
      Caused by: org.apache.solr.common.SolrException: Could not load conf for core test_shard1_replica1: Can't load schema managed-schema: Plugin init failure for [schema.xml] fieldType "preanalyzed": Cannot load analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
      at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:85)
      at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1031)
      ... 6 more
      Caused by: org.apache.solr.common.SolrException: Can't load schema managed-schema: Plugin init failure for [schema.xml] fieldType "preanalyzed": Cannot load analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
      at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:600)
      at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:183)
      at org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:104)
      at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:172)
      at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:45)
      at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:75)
      at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:107)
      at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:78)
      ... 7 more

      Test procedure:
      1.create collection using sample_techproducts_configs;
      2.add field in Solr web view;
      3.add field again in Solr web view.
      manage-schema is modifyed as follows:
      <fieldType name="preanalyzed" class="solr.PreAnalyzedField">
      <analyzer class=" org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer">
      </analyzer>
      </fieldType>

      1. SOLR-9751.patch
        15 kB
        Steve Rowe
      2. SOLR-9751.patch
        17 kB
        Steve Rowe
      3. SOLR-9751.patch
        8 kB
        Steve Rowe

        Issue Links

          Activity

          Hide
          steve_rowe Steve Rowe added a comment -

          I can reproduce on master. I'm looking into it.

          Show
          steve_rowe Steve Rowe added a comment - I can reproduce on master. I'm looking into it.
          Hide
          steve_rowe Steve Rowe added a comment -

          The admin UI is not the problem - I can reproduce by cmdline only.

          The original preanalyzed fieldtype is:

          <fieldType name="preanalyzed" class="solr.PreAnalyzedField">
            <!-- PreAnalyzedField's builtin index analyzer just decodes the pre-analyzed token stream. -->
            <analyzer type="query">
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            </analyzer>
          </fieldType>
          

          After one field (any field type will do) is added, it becomes:

          <fieldType name="preanalyzed" class="solr.PreAnalyzedField">
            <analyzer>
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            </analyzer>
          </fieldType>
          

          and after the second field is added:

          <fieldType name="preanalyzed" class="solr.PreAnalyzedField">
            <analyzer class="org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer"/>
          </fieldType>
          
          Show
          steve_rowe Steve Rowe added a comment - The admin UI is not the problem - I can reproduce by cmdline only. The original preanalyzed fieldtype is: <fieldType name= "preanalyzed" class= "solr.PreAnalyzedField" > <!-- PreAnalyzedField's builtin index analyzer just decodes the pre-analyzed token stream. --> <analyzer type= "query" > <tokenizer class= "solr.WhitespaceTokenizerFactory" /> </analyzer> </fieldType> After one field (any field type will do) is added, it becomes: <fieldType name= "preanalyzed" class= "solr.PreAnalyzedField" > <analyzer> <tokenizer class= "solr.WhitespaceTokenizerFactory" /> </analyzer> </fieldType> and after the second field is added: <fieldType name= "preanalyzed" class= "solr.PreAnalyzedField" > <analyzer class= "org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer" /> </fieldType>
          Hide
          steve_rowe Steve Rowe added a comment -

          More complete stack trace:

          Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "preanalyzed": Cannot load analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
                  at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:182)
                  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:491)
                  ... 36 more
          Caused by: org.apache.solr.common.SolrException: Cannot load analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
                  at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:287)
                  at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:104)
                  at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:53)
                  at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
                  ... 37 more
          Caused by: java.lang.InstantiationException: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer
                  at java.lang.Class.newInstance(Class.java:427)
                  at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:271)
                  ... 40 more
          Caused by: java.lang.NoSuchMethodException: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer.<init>()
                  at java.lang.Class.getConstructor0(Class.java:3082)
                  at java.lang.Class.newInstance(Class.java:412)
                  ... 41 more
          

          The (private) PreAnalyzedAnalyzer doesn't have a default ctor - its only ctor requires a parser param.

          Note that this ^^ is not really the problem - the problem is that serialization is losing information (the query-time analysis chain) and instead including a built-in non-substitutable analyzer: PreAnalyzedField doesn't allow re-configuration of its index-time analysis chain.

          Show
          steve_rowe Steve Rowe added a comment - More complete stack trace: Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "preanalyzed": Cannot load analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:182) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:491) ... 36 more Caused by: org.apache.solr.common.SolrException: Cannot load analyzer: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:287) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:104) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:53) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152) ... 37 more Caused by: java.lang.InstantiationException: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer at java.lang.Class.newInstance(Class.java:427) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:271) ... 40 more Caused by: java.lang.NoSuchMethodException: org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer.<init>() at java.lang.Class.getConstructor0(Class.java:3082) at java.lang.Class.newInstance(Class.java:412) ... 41 more The (private) PreAnalyzedAnalyzer doesn't have a default ctor - its only ctor requires a parser param. Note that this ^^ is not really the problem - the problem is that serialization is losing information (the query-time analysis chain) and instead including a built-in non-substitutable analyzer: PreAnalyzedField doesn't allow re-configuration of its index-time analysis chain.
          Hide
          steve_rowe Steve Rowe added a comment -

          Seems to only reproduce in cloud setups - I can repro with bin/solr -e cloud but not bin/solr -e techproducts.

          Show
          steve_rowe Steve Rowe added a comment - Seems to only reproduce in cloud setups - I can repro with bin/solr -e cloud but not bin/solr -e techproducts .
          Hide
          steve_rowe Steve Rowe added a comment - - edited

          Patch with failing test, no fix yet.

          In a 2-node cluster with a 2-shard, rf=1 collection, the cluster enters a failure loop where it can't read the schema - my IDE ran out of memory trying to store the log output. In this test, the schema for which includes three types of PreAnalyzedField config, only 1 field has to be added before this condition occurs.

          Show
          steve_rowe Steve Rowe added a comment - - edited Patch with failing test, no fix yet. In a 2-node cluster with a 2-shard, rf=1 collection, the cluster enters a failure loop where it can't read the schema - my IDE ran out of memory trying to store the log output. In this test, the schema for which includes three types of PreAnalyzedField config, only 1 field has to be added before this condition occurs.
          Hide
          steve_rowe Steve Rowe added a comment -

          Patch with a fix.

          The problem results from a combination of lenient schema parsing and the lack of a concept for non-user-specifiable index-time analyzers.

          This patch adds a new interface HasImplicitIndexAnalyzer, implemented by PreAnalyzedField, and schema parsing and serialization now properly handle this case. As a result, when a field type implements HasImplicitIndexAnalyzer, regardless of the original specified analyzer type, an analyzer (if any) will always be specified as a query-time analyzer, even if it was originally specified as a non-specific or index-time analyzer.

          I've also adding logged warnings for cases where analyzers are specified in the schema for field types that don't support analyzers (currently non-TextField-s).

          I'll commit once all tests and precommit pass.

          Show
          steve_rowe Steve Rowe added a comment - Patch with a fix. The problem results from a combination of lenient schema parsing and the lack of a concept for non-user-specifiable index-time analyzers. This patch adds a new interface HasImplicitIndexAnalyzer , implemented by PreAnalyzedField , and schema parsing and serialization now properly handle this case. As a result, when a field type implements HasImplicitIndexAnalyzer , regardless of the original specified analyzer type, an analyzer (if any) will always be specified as a query-time analyzer, even if it was originally specified as a non-specific or index-time analyzer. I've also adding logged warnings for cases where analyzers are specified in the schema for field types that don't support analyzers (currently non- TextField -s). I'll commit once all tests and precommit pass.
          Hide
          steve_rowe Steve Rowe added a comment -

          Manual testing of the Schema API in the standalone Solr case shows the same problem as in the SolrCloud case, even though it didn't trigger system failure, so the problem is not confined to SolrCloud.

          Show
          steve_rowe Steve Rowe added a comment - Manual testing of the Schema API in the standalone Solr case shows the same problem as in the SolrCloud case, even though it didn't trigger system failure, so the problem is not confined to SolrCloud.
          Hide
          steve_rowe Steve Rowe added a comment -

          Patch, fixes a precommit issue in the last patch (unused imports), and removes the warnings I added for analyzers specified on non-TextField-s, since FieldType.set{Index,Query}Analyzer() already handles this case as a severe error.

          Precommit and all tests pass. Committing shortly.

          Show
          steve_rowe Steve Rowe added a comment - Patch, fixes a precommit issue in the last patch (unused imports), and removes the warnings I added for analyzers specified on non- TextField -s, since FieldType.set{Index,Query}Analyzer() already handles this case as a severe error. Precommit and all tests pass. Committing shortly.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8989c774783a80cab6902e45f111cfe60ed15d49 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8989c77 ]

          SOLR-9751: PreAnalyzedField can cause managed schema corruption

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8989c774783a80cab6902e45f111cfe60ed15d49 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8989c77 ] SOLR-9751 : PreAnalyzedField can cause managed schema corruption
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 76b439a0bdf8a3e74f53130571535bbfdec5c771 in lucene-solr's branch refs/heads/master from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=76b439a ]

          SOLR-9751: PreAnalyzedField can cause managed schema corruption

          Show
          jira-bot ASF subversion and git services added a comment - Commit 76b439a0bdf8a3e74f53130571535bbfdec5c771 in lucene-solr's branch refs/heads/master from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=76b439a ] SOLR-9751 : PreAnalyzedField can cause managed schema corruption
          Hide
          steve_rowe Steve Rowe added a comment -

          I opened SOLR-9765 to explore dealing with the general case of mixed schema modification request success/failure. (On this issue, modification requests succeeded on the coordinating replica, but failed on other replicas.)

          Show
          steve_rowe Steve Rowe added a comment - I opened SOLR-9765 to explore dealing with the general case of mixed schema modification request success/failure. (On this issue, modification requests succeeded on the coordinating replica, but failed on other replicas.)
          Hide
          steve_rowe Steve Rowe added a comment -

          The test I added on this issue has been failing regularly, e.g. from https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2247/:

            [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=PreAnalyzedFieldManagedSchemaCloudTest -Dtests.method=testAdd2Fields -Dtests.seed=CD72F125201C0C76 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=is -Dtests.timezone=Antarctica/McMurdo -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
            [junit4] ERROR   0.09s J0 | PreAnalyzedFieldManagedSchemaCloudTest.testAdd2Fields <<<
            [junit4]    > Throwable #1: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://127.0.0.1:39011/solr/managed-preanalyzed, https://127.0.0.1:33343/solr/managed-preanalyzed]
            [junit4]    > 	at __randomizedtesting.SeedInfo.seed([CD72F125201C0C76:656743CEFC1A9F80]:0)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:414)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1292)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1062)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004)
            [junit4]    > 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
            [junit4]    > 	at org.apache.solr.schema.PreAnalyzedFieldManagedSchemaCloudTest.addField(PreAnalyzedFieldManagedSchemaCloudTest.java:61)
            [junit4]    > 	at org.apache.solr.schema.PreAnalyzedFieldManagedSchemaCloudTest.testAdd2Fields(PreAnalyzedFieldManagedSchemaCloudTest.java:52)
            [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
            [junit4]    > Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:39011/solr/managed-preanalyzed: No such path /schema/fields/field2
            [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)
            [junit4]    > 	at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
          

          The test failed only 1/100 beasting iterations on my Jenkins box. The problem appears to be that the test doesn't wait long enough for schema changes to propogate to all replicas before attempting to access a newly created field through a replica that hasn't yet gotten the changes. I'm testing a patch that adds updateTimeoutSecs=15 to the SchemaRequest.AddField requests used by the test.

          Show
          steve_rowe Steve Rowe added a comment - The test I added on this issue has been failing regularly, e.g. from https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2247/ : [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=PreAnalyzedFieldManagedSchemaCloudTest -Dtests.method=testAdd2Fields -Dtests.seed=CD72F125201C0C76 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=is -Dtests.timezone=Antarctica/McMurdo -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 0.09s J0 | PreAnalyzedFieldManagedSchemaCloudTest.testAdd2Fields <<< [junit4] > Throwable #1: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://127.0.0.1:39011/solr/managed-preanalyzed, https://127.0.0.1:33343/solr/managed-preanalyzed] [junit4] > at __randomizedtesting.SeedInfo.seed([CD72F125201C0C76:656743CEFC1A9F80]:0) [junit4] > at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:414) [junit4] > at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1292) [junit4] > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1062) [junit4] > at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004) [junit4] > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) [junit4] > at org.apache.solr.schema.PreAnalyzedFieldManagedSchemaCloudTest.addField(PreAnalyzedFieldManagedSchemaCloudTest.java:61) [junit4] > at org.apache.solr.schema.PreAnalyzedFieldManagedSchemaCloudTest.testAdd2Fields(PreAnalyzedFieldManagedSchemaCloudTest.java:52) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] > Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:39011/solr/managed-preanalyzed: No such path /schema/fields/field2 [junit4] > at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593) [junit4] > at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262) [junit4] > at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) [junit4] > at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435) [junit4] > at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387) The test failed only 1/100 beasting iterations on my Jenkins box. The problem appears to be that the test doesn't wait long enough for schema changes to propogate to all replicas before attempting to access a newly created field through a replica that hasn't yet gotten the changes. I'm testing a patch that adds updateTimeoutSecs=15 to the SchemaRequest.AddField requests used by the test.
          Hide
          steve_rowe Steve Rowe added a comment -

          The test and fix in SOLR-9832 don't involve PreAnalyzedField, so this issue can be resolved.

          Show
          steve_rowe Steve Rowe added a comment - The test and fix in SOLR-9832 don't involve PreAnalyzedField, so this issue can be resolved.

            People

            • Assignee:
              steve_rowe Steve Rowe
              Reporter:
              liuyang@huawei liuyang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development