Solr
  1. Solr
  2. SOLR-3251 dynamically add fields to schema
  3. SOLR-4623

Add REST API methods to get all remaining schema information, and also to return the full live schema in json, xml, and schema.xml formats

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 4.3, 6.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      Each remaining schema component (after field types, fields, dynamic fields, copy fields were added by SOLR-4503) should be available from the schema REST API: name, version, default query operator, similarity, default search field, and unique key.

      It should be possible to get the entire live schema back with a single request, and schema.xml format should be one of the supported response formats.

      1. JSONResponseWriter.output.json
        41 kB
        Steve Rowe
      2. SchemaXmlResponseWriter.output.xml
        30 kB
        Steve Rowe
      3. SOLR-4623.patch
        170 kB
        Steve Rowe
      4. SOLR-4623-fix-classname-shortening-part-deux.patch
        7 kB
        Steve Rowe
      5. SOLR-4623-fix-classname-shortening-part-deux.patch
        6 kB
        Steve Rowe
      6. XMLResponseWriter.output.xml
        60 kB
        Steve Rowe

        Activity

        Hide
        Steve Rowe added a comment -

        Patch.

        The full schema is available via the "/schema" path, e.g. http://localhost:8983/solr/collection1/schema. "?wt=json" and "?wt=xml" produce the expected output formats. SchemaXmlResponseWriter is added to provide output in schema.xml format, available via "?wt=schema.xml". Samples attached.

        Also moves schema REST API methods from package org.apache.solr.rest to org.apache.solr.rest.schema, and renames base functionality REST API classes to remove the current schema focus, to prepare for other non-schema REST APIs.

        Also changes output path for copyFields and dynamicFields from "copyfields" and "dynamicfields" (all lowercase) to "copyFields" and "dynamicFields", respectively, to mirror all other REST API outputs, which use camel-case.

        I think this is ready to go.

        Show
        Steve Rowe added a comment - Patch. The full schema is available via the "/schema" path, e.g. http://localhost:8983/solr/collection1/schema . "?wt=json" and "?wt=xml" produce the expected output formats. SchemaXmlResponseWriter is added to provide output in schema.xml format, available via "?wt=schema.xml". Samples attached. Also moves schema REST API methods from package org.apache.solr.rest to org.apache.solr.rest.schema, and renames base functionality REST API classes to remove the current schema focus, to prepare for other non-schema REST APIs. Also changes output path for copyFields and dynamicFields from "copyfields" and "dynamicfields" (all lowercase) to "copyFields" and "dynamicFields", respectively, to mirror all other REST API outputs, which use camel-case. I think this is ready to go.
        Hide
        Steve Rowe added a comment -

        Also changes output path for copyFields and dynamicFields from "copyfields" and "dynamicfields" (all lowercase) to "copyFields" and "dynamicFields", respectively, to mirror all other REST API outputs, which use camel-case.

        I want to point out a design choice I've made with the REST API URLs: I followed what appears to me to be a pattern in Solr's URLs: all paths elements in lowercase, and query params either in camel-case or dots.separating.words formats.

        Show
        Steve Rowe added a comment - Also changes output path for copyFields and dynamicFields from "copyfields" and "dynamicfields" (all lowercase) to "copyFields" and "dynamicFields", respectively, to mirror all other REST API outputs, which use camel-case. I want to point out a design choice I've made with the REST API URLs: I followed what appears to me to be a pattern in Solr's URLs: all paths elements in lowercase, and query params either in camel-case or dots.separating.words formats.
        Hide
        Steve Rowe added a comment -

        Committed:

        Show
        Steve Rowe added a comment - Committed: trunk r1460519 branch_4x r1460521
        Hide
        Robert Muir added a comment -

        Reopening to ensure my comments are taken seriously

        Show
        Robert Muir added a comment - Reopening to ensure my comments are taken seriously
        Hide
        Steve Rowe added a comment -

        Robert, I replied to you on the mailing list, and I tried to contact you on #lucene IRC.

        You haven't responded in any meaningful way.

        So please help me understand what you don't like and how you think it ought to be fixed.

        Show
        Steve Rowe added a comment - Robert, I replied to you on the mailing list, and I tried to contact you on #lucene IRC. You haven't responded in any meaningful way. So please help me understand what you don't like and how you think it ought to be fixed.
        Hide
        Steve Rowe added a comment -

        Robert's comment from the mailing list - I'll commit the patch shortly, as I agree about the bugs it fixes - thanks Robert:

        Well there are several bugs, resulting from the over-aggressive
        normalization combined with normalizing always despite this comment:

        // Only normalize factory names

        So consider the case someone has
        <similarity class="org.apache.lucene.search.similarities.BM25Similarity"/>

        which is allowed (it uses the anonymous factory). In this case its
        bogusly normalized to "solr.BM25Similarity" which is invalid and won't
        be loaded by IndexSchema, since it only looks for solr.xxxx in
        org.apache.solr.search.similarities.

        I think a patch like the following is a good start, but we should
        review the other uses of the same code-dup'ed function in IndexSchema
        and ensure there are not similar bugs:

        I'm sorry if i came off terse or as a haiku, its not a big deal, I
        just want it to work correctly.

        Index: solr/core/src/java/org/apache/solr/schema/SimilarityFactory.java
        ===================================================================
        --- solr/core/src/java/org/apache/solr/schema/SimilarityFactory.java	(revision 1460952)
        +++ solr/core/src/java/org/apache/solr/schema/SimilarityFactory.java	(working copy)
        @@ -51,9 +51,9 @@
          public abstract Similarity getSimilarity();
        
        
        -  private static String normalizeSPIname(String fullyQualifiedName) {
        -    if (fullyQualifiedName.startsWith("org.apache.lucene.") || fullyQualifiedName.startsWith("org.apache.solr.")) {
        -      return "solr" + fullyQualifiedName.substring(fullyQualifiedName.lastIndexOf('.'));
        +  private static String normalizeName(String fullyQualifiedName) {
        +    if (fullyQualifiedName.startsWith("org.apache.solr.search.similarities.")) {
        +      return "solr" + fullyQualifiedName.substring("org.apache.solr.search.similarities".length());
            }
            return fullyQualifiedName;
          }
        @@ -66,10 +66,10 @@
              className = getSimilarity().getClass().getName();
            } else {
              // Only normalize factory names
        -      className = normalizeSPIname(className);
        +      className = normalizeName(className);
            }
            SimpleOrderedMap<Object> props = new SimpleOrderedMap<Object>();
        -    props.add(CLASS_NAME, normalizeSPIname(className));
        +    props.add(CLASS_NAME, className);
            if (null != params) {
              Iterator<String> iter = params.getParameterNamesIterator();
              while (iter.hasNext()) {
        
        Show
        Steve Rowe added a comment - Robert's comment from the mailing list - I'll commit the patch shortly, as I agree about the bugs it fixes - thanks Robert: Well there are several bugs, resulting from the over-aggressive normalization combined with normalizing always despite this comment: // Only normalize factory names So consider the case someone has <similarity class="org.apache.lucene.search.similarities.BM25Similarity"/> which is allowed (it uses the anonymous factory). In this case its bogusly normalized to "solr.BM25Similarity" which is invalid and won't be loaded by IndexSchema, since it only looks for solr.xxxx in org.apache.solr.search.similarities. I think a patch like the following is a good start, but we should review the other uses of the same code-dup'ed function in IndexSchema and ensure there are not similar bugs: I'm sorry if i came off terse or as a haiku, its not a big deal, I just want it to work correctly. Index: solr/core/src/java/org/apache/solr/schema/SimilarityFactory.java =================================================================== --- solr/core/src/java/org/apache/solr/schema/SimilarityFactory.java (revision 1460952) +++ solr/core/src/java/org/apache/solr/schema/SimilarityFactory.java (working copy) @@ -51,9 +51,9 @@ public abstract Similarity getSimilarity(); - private static String normalizeSPIname(String fullyQualifiedName) { - if (fullyQualifiedName.startsWith("org.apache.lucene.") || fullyQualifiedName.startsWith("org.apache.solr.")) { - return "solr" + fullyQualifiedName.substring(fullyQualifiedName.lastIndexOf('.')); + private static String normalizeName(String fullyQualifiedName) { + if (fullyQualifiedName.startsWith("org.apache.solr.search.similarities.")) { + return "solr" + fullyQualifiedName.substring("org.apache.solr.search.similarities".length()); } return fullyQualifiedName; } @@ -66,10 +66,10 @@ className = getSimilarity().getClass().getName(); } else { // Only normalize factory names - className = normalizeSPIname(className); + className = normalizeName(className); } SimpleOrderedMap<Object> props = new SimpleOrderedMap<Object>(); - props.add(CLASS_NAME, normalizeSPIname(className)); + props.add(CLASS_NAME, className); if (null != params) { Iterator<String> iter = params.getParameterNamesIterator(); while (iter.hasNext()) {
        Hide
        Steve Rowe added a comment -

        Robert's comment from the mailing list - I'll commit the patch shortly, as I agree about the bugs it fixes - thanks Robert: [...]

        Patch committed to trunk and branch_4x.

        Show
        Steve Rowe added a comment - Robert's comment from the mailing list - I'll commit the patch shortly, as I agree about the bugs it fixes - thanks Robert: [...] Patch committed to trunk and branch_4x.
        Hide
        Steve Rowe added a comment -

        we should review the other uses of the same code-dup'ed function in IndexSchema and ensure there are not similar bugs

        The code-dup'ed function is in FieldType, not IndexSchema, and right now it's used to convert fully qualified class names of analyzers, and analysis components, to short name "solr.<SimpleClassName>".

        Looking at SolrResourceLoader.findClass(), where analysis component references of the form "solr.<SimpleClassName>" are converted to Class references, I see that this is inappropriate for analyzer classes, since Lucene SPI doesn't cover them. I'll stop shortening analyzer classnames.

        I looked up the currently defined analysis factories in trunk, and all of them are under org.apache.lucene.analysis.** and org.apache.solr.analysis.**. Lucene analysis component factories are loaded via SPI, and Solr analysis factories are discovered by iteratively attempting Class.forName() using a fixed set of package prefixes, including "org.apache.solr.analysis.".

        I'll change the acceptable prefixes to "org.apache.lucene.analysis." and "org.apache.solr.analysis.".

        Since SPI isn't used for Solr factories, I'll change the method name from normalizeSPIname() to getShortName(), since "shortname"/"short name" seems to be what "solr.<SimpleClassName>" instances are called. I would change SimilarityFactory.normalizeName() to getShortName() too, but I see it's only called the one time, so I'll inline it and get rid of the method.

        Patch coming shortly.

        Show
        Steve Rowe added a comment - we should review the other uses of the same code-dup'ed function in IndexSchema and ensure there are not similar bugs The code-dup'ed function is in FieldType, not IndexSchema, and right now it's used to convert fully qualified class names of analyzers, and analysis components, to short name "solr.<SimpleClassName>". Looking at SolrResourceLoader.findClass(), where analysis component references of the form "solr.<SimpleClassName>" are converted to Class references, I see that this is inappropriate for analyzer classes, since Lucene SPI doesn't cover them. I'll stop shortening analyzer classnames. I looked up the currently defined analysis factories in trunk, and all of them are under org.apache.lucene.analysis.** and org.apache.solr.analysis.**. Lucene analysis component factories are loaded via SPI, and Solr analysis factories are discovered by iteratively attempting Class.forName() using a fixed set of package prefixes, including "org.apache.solr.analysis.". I'll change the acceptable prefixes to "org.apache.lucene.analysis." and "org.apache.solr.analysis.". Since SPI isn't used for Solr factories, I'll change the method name from normalizeSPIname() to getShortName(), since "shortname"/"short name" seems to be what "solr.<SimpleClassName>" instances are called. I would change SimilarityFactory.normalizeName() to getShortName() too, but I see it's only called the one time, so I'll inline it and get rid of the method. Patch coming shortly.
        Hide
        Steve Rowe added a comment -

        Patch with the fixes.

        Committing shortly.

        Show
        Steve Rowe added a comment - Patch with the fixes. Committing shortly.
        Hide
        Steve Rowe added a comment -

        Oops, last patch didn't cover shortening of names of FieldType subclasses, which live under package org.apache.solr.schema, another member of the package prefix set that SolrResourceLoader.findClass() checks for. Fortunately a couple schema REST API tests caught this problem.

        This patch converts the qualification tests in getShortName() to a regex accepting prefixes "org.apache.lucene.analysis.(whatever)", "org.apache.solr.analysis.", and "org.apache.solr.schema."

        Committing shortly. For reals this time.

        Show
        Steve Rowe added a comment - Oops, last patch didn't cover shortening of names of FieldType subclasses, which live under package org.apache.solr.schema, another member of the package prefix set that SolrResourceLoader.findClass() checks for. Fortunately a couple schema REST API tests caught this problem. This patch converts the qualification tests in getShortName() to a regex accepting prefixes "org.apache.lucene.analysis.(whatever)", "org.apache.solr.analysis.", and "org.apache.solr.schema." Committing shortly. For reals this time.
        Hide
        Steve Rowe added a comment -

        Committed to trunk and branch_4x.

        Show
        Steve Rowe added a comment - Committed to trunk and branch_4x.
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Steve Rowe
            Reporter:
            Steve Rowe
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development