Solr
  1. Solr
  2. SOLR-7366

Can't index example XML docs into the cloud example using bin/post due to regression in ManagedIndexSchema's handling of ResourceLoaderAware objects used by field types

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.1
    • Fix Version/s: 5.1, 5.2, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Managed schema code has regressed SOLR-6764. Need to fix the regression and implement a unit test to catch this type of regression. This is a blocker for 5.1.

      To reproduce:

      bin/solr -e cloud -noprompt
      bin/post -c gettingstarted example/exampledocs/*.xml
      
      1. SOLR-7366.patch
        14 kB
        Steve Rowe
      2. SOLR-7366.patch
        21 kB
        Timothy Potter
      3. SOLR-7366.patch
        15 kB
        Steve Rowe

        Issue Links

          Activity

          Hide
          Steve Rowe added a comment -

          As I mentioned on SOLR-6141:

          This version of the patch modifies ZkIndexSchemaReader.updateSchema() to fully parse the remote changed schema rather than merging the local copy with the remote copy - now that the schema is (almost) fully addressable with the schema API, we can't reliably do such merges.

          But I didn't add ResourceLoaderAware inform'ing to IndexSchema.readSchema(), so newly parsed schemas were not being properly initialized.

          Show
          Steve Rowe added a comment - As I mentioned on SOLR-6141 : This version of the patch modifies ZkIndexSchemaReader.updateSchema() to fully parse the remote changed schema rather than merging the local copy with the remote copy - now that the schema is (almost) fully addressable with the schema API, we can't reliably do such merges. But I didn't add ResourceLoaderAware inform'ing to IndexSchema.readSchema() , so newly parsed schemas were not being properly initialized.
          Hide
          Steve Rowe added a comment -

          Patch against trunk adding ResourceLoaderAware inform'ing to every place that does SchemaAware inform'ing, including in IndexSchema.readSchema(). I had to move informResourceLoaderAwareObjectsInChain() and informResourceLoaderAwareObjectsForFieldType() from ManagedIndexSchema to IndexSchema so that they are accessible from IndexSchema.readSchema().

          After I fixed the code, the data triggered a schemaless issue: there is a document (not the first!) in example/exampledocs/ipod_other.xml with an integral value for the weight field. In standalone mode, this isn't an issue because docs are processed sequentially, but in cloud mode, some docs can be indexed on non-coordinator nodes, resulting in out-of-order schema modifications. I ran into just that, with errors like the following in the solr.log:

          ERROR - 2015-04-09 00:06:55.371; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: ERROR
          : [doc=F8V7067-APL-KIT] Error adding field 'weight'='4.0' msg=For input string: "4.0"
          [...]
          Caused by: java.lang.NumberFormatException: For input string: "4.0"
                  at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
                  at java.lang.Long.parseLong(Long.java:441)
          

          So this patch also modifies example/exampledocs/ipod_other.xml to convert the one integral weight field value into a float.

          I tested the patch manually against trunk and lucene_solr_5_1 using the repro instructions in the description, both succeeded.

          Show
          Steve Rowe added a comment - Patch against trunk adding ResourceLoaderAware inform'ing to every place that does SchemaAware inform'ing, including in IndexSchema.readSchema() . I had to move informResourceLoaderAwareObjectsInChain() and informResourceLoaderAwareObjectsForFieldType() from ManagedIndexSchema to IndexSchema so that they are accessible from IndexSchema.readSchema() . After I fixed the code, the data triggered a schemaless issue: there is a document (not the first!) in example/exampledocs/ipod_other.xml with an integral value for the weight field. In standalone mode, this isn't an issue because docs are processed sequentially, but in cloud mode, some docs can be indexed on non-coordinator nodes, resulting in out-of-order schema modifications. I ran into just that, with errors like the following in the solr.log : ERROR - 2015-04-09 00:06:55.371; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: ERROR : [doc=F8V7067-APL-KIT] Error adding field 'weight'='4.0' msg=For input string: "4.0" [...] Caused by: java.lang.NumberFormatException: For input string: "4.0" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) So this patch also modifies example/exampledocs/ipod_other.xml to convert the one integral weight field value into a float. I tested the patch manually against trunk and lucene_solr_5_1 using the repro instructions in the description, both succeeded.
          Hide
          Timothy Potter added a comment -

          Patch updated with unit test that catches this problem, i.e. it passes with Steve's fix applied, but fails without.

          Show
          Timothy Potter added a comment - Patch updated with unit test that catches this problem, i.e. it passes with Steve's fix applied, but fails without.
          Hide
          Steve Rowe added a comment -

          I refactored the patch, pulling out post-read inform'ing into a method IndexSchema.postReadInform(), overridden in ManagedIndexSchema to also do the ResourceLoaderAware inform'ing so that IndexSchema.readSchema() doesn't cause non-managed schemas to have to do unnecessary inform'ing.

          I ran all Solr tests on trunk, including Tim's new test, and all passed.

          Committing shortly.

          Show
          Steve Rowe added a comment - I refactored the patch, pulling out post-read inform'ing into a method IndexSchema.postReadInform() , overridden in ManagedIndexSchema to also do the ResourceLoaderAware inform'ing so that IndexSchema.readSchema() doesn't cause non-managed schemas to have to do unnecessary inform'ing. I ran all Solr tests on trunk, including Tim's new test, and all passed. Committing shortly.
          Hide
          ASF subversion and git services added a comment -

          Commit 1672238 from Steve Rowe in branch 'dev/trunk'
          [ https://svn.apache.org/r1672238 ]

          SOLR-7366: fix regression in ManagedIndexSchema's handling of ResourceLoaderAware objects used by field types, causing example XML docs to not be indexable via bin/post; add a test indexing example docs that fails without the patch and succeeds with it

          Show
          ASF subversion and git services added a comment - Commit 1672238 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1672238 ] SOLR-7366 : fix regression in ManagedIndexSchema's handling of ResourceLoaderAware objects used by field types, causing example XML docs to not be indexable via bin/post; add a test indexing example docs that fails without the patch and succeeds with it
          Hide
          ASF subversion and git services added a comment -

          Commit 1672239 from Steve Rowe in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1672239 ]

          SOLR-7366: fix regression in ManagedIndexSchema's handling of ResourceLoaderAware objects used by field types, causing example XML docs to not be indexable via bin/post; add a test indexing example docs that fails without the patch and succeeds with it (merged trunk r1672238)

          Show
          ASF subversion and git services added a comment - Commit 1672239 from Steve Rowe in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1672239 ] SOLR-7366 : fix regression in ManagedIndexSchema's handling of ResourceLoaderAware objects used by field types, causing example XML docs to not be indexable via bin/post; add a test indexing example docs that fails without the patch and succeeds with it (merged trunk r1672238)
          Hide
          ASF subversion and git services added a comment -

          Commit 1672240 from Steve Rowe in branch 'dev/branches/lucene_solr_5_1'
          [ https://svn.apache.org/r1672240 ]

          SOLR-7366: fix regression in ManagedIndexSchema's handling of ResourceLoaderAware objects used by field types, causing example XML docs to not be indexable via bin/post; add a test indexing example docs that fails without the patch and succeeds with it (merged trunk r1672238)

          Show
          ASF subversion and git services added a comment - Commit 1672240 from Steve Rowe in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1672240 ] SOLR-7366 : fix regression in ManagedIndexSchema's handling of ResourceLoaderAware objects used by field types, causing example XML docs to not be indexable via bin/post; add a test indexing example docs that fails without the patch and succeeds with it (merged trunk r1672238)
          Hide
          Steve Rowe added a comment -

          Committed to trunk, branch_5x and lucene_solr_5_1.

          Show
          Steve Rowe added a comment - Committed to trunk, branch_5x and lucene_solr_5_1.
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release

            People

            • Assignee:
              Steve Rowe
              Reporter:
              Timothy Potter
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development