Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-646

Configuration properties enhancements in solr.xml

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Do
    • 1.4
    • 4.9, 6.0
    • None
    • None

    Description

      This patch refers to 'generalized configuration properties' as specified by HossMan
      This means configuration & schema files can use expression based on properties defined in solr.xml.

      Use cases:

      Describe core data directories from solr.xml as properties.
      Share the same schema and/or config file between multiple cores.
      Share reusable fragments of schema & configuration between multiple cores.

      Usage:

      solr.xml

      This solr.xml will be used to illustrates using properties for different purpose.

      <solr persistent="true">
        <property name="version" value="1.3"/>
        <property name="lang" value="english, french"/>
        <property name="en-cores" value="en,core0"/>
        <property name="fr-cores" value="fr,core1"/>
        <!-- This experimental feature flag enables schema & solrconfig to include other files --> 
        <property name="solr.experimental.enableConfigInclude" value="true"/>
        <cores adminPath="/admin/cores">
          <core name="${en-cores}" instanceDir="./">
      	  <property name="version" value="3.5"/>
      	  <property name="l10n" value="EN"/>
      	  <property name="ctlField" value="core0"/>
      	  <property name="comment" value="This is a sample"/>
      	</core>
          <core name="${fr-cores}" instanceDir="./">
      	  <property name="version" value="2.4"/>
      	  <property name="l10n" value="FR"/>
      	  <property name="ctlField" value="core1"/>
      	  <property name="comment" value="Ceci est un exemple"/>
      	</core>
        </cores>
      </solr>
      

      version : if you update your solr.xml or your cores for various motives, it can be useful to track of a version. In this example, this will be used to define the dataDir for each core.
      en-cores,fr-cores: with aliases, if the list is long or repetitive, it might be convenient to use a property that can then be used to describe the Solr core name.
      instanceDir: note that both cores will use the same instance directory, sharing their configuration and schema. The dataDir will be set for each of them from the solrconfig.xml.

      solrconfig.xml

      This is where our solr.xml property are used to define the data directory as a composition of, in our example, the language code l10n and the core version stored in version.

      <config>
        <dataDir>${solr.solr.home}/data/${l10n}-${version}</dataDir>
      ....
      </config>
      
      schema.xml

      The include allows to import a file within the schema (or a solrconfig); this can help de-clutter long schemas or reuse parts.
      The ctlField is just illustrating that a field & its type can be set through properties as well; in our example, we will want the 'english' core to refer to an 'english-configured' field and the 'french' core to a 'french-configured' one. The type for the field is defined as text-EN or text-FR after expansion.

      <schema name="example core ${l10n}" version="1.1">
        <types>
      ...
         <include resource="text-l10n.xml"/>
        </types>
      
       <fields>   
      ...
        <field name="${ctlField}"   type="text-${l10n}"   indexed="true"  stored="true"  multiValued="true" /> 
       </fields>
      

      This schema is importing this text-l10n.xml file which is a fragment; the fragment tag must be present & indicates the file is to be included. Our example only defines different stopwords for each language but you could of course extend this to stemmers, synonyms, etc.

      <fragment>
      	<fieldType name="text-FR" class="solr.TextField" positionIncrementGap="100">
      ...
      	    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-fr.txt"/>
      ...
      	</fieldType>
      	<fieldType name="text-EN" class="solr.TextField" positionIncrementGap="100">
      ...
      	    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-en.txt"/>
      ...
      	</fieldType>
      </fragment>
      

      Alternatively, one can use XML entities using the 'solr:' protocol to the same end as in:

      <!DOCTYPE schema [
      <!ENTITY textL10n SYSTEM "solr:${l10ntypes}">
      ]>
      <schema name="example core ${l10n}" version="1.1">
        <types>
         <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
         <!--include resource="text-l10n.xml"/-->
         &textL10n;
        </types>
        ...
      </schema>
      

      Technical specifications

      solr.xml can define properties at the multicore & each core level.
      Properties defined in the multicore scope can override system properties.
      Properties defined in a core scope can override multicore & system properties.
      Property definitions can use expressions to define their name & value; these expressions are evaluated in their outer scope context .
      CoreContainer serialization keeps properties as defined; persistence is idem-potent. (ie property expressions are written, not their evaluation).

      The core descriptor properties are automatically defined in each core context, namely:
      solr.core.instanceDir
      solr.core.name
      solr.core.configName
      solr.core.schemaName

      Coding notes:

      • DOMUtil.java:
        cosmetic changes
        toMapExcept systematically skips 'xml:base" attributes (which may come from entity resolving)
      • CoreDescriptor.java:
        The core descriptor does not store properties as values but as expressions (and all its members can be property expressions as well) allowing to write file as defined (not as evaluated)
        The public getCoreProperties is removed for that reason. (too bad we were in such a rush...)
      • CoreContainer.java:
        changes related to extracting the core names before they are evaluated in load()
        changes related to evaluating core descriptor member before adding them to the core's loader properties
        fix in persistFile which was not interpreting relative pathes correctly
        fix in persist because properties were not written at the right place
        changes in persist to write expressions (and core name when it is one)
      • Config.java:
        subsituteProperties has been moved out of constructor so calls must be explicit.
        added the entity resolver
        added subsituteIncludes which processes <include name.../>
      • SolrConfig.java & IndexSchema.java
        added explicit calls to substituteIncludesto perform property/include expansion
      • SolrResourceLoader.java
        cosmetic, changed getCoreProperties to getProperties (since they may come from the CoreContainer)
      • SolrProperties.java:
        schema uses a localization (l10n) property to define an attribute
        persists the file to check it keeps the expression properties
      • QueryElevationComponent.java
        Needed to explicitly call substituteProperties.

      Attachments

        1. solr-646.patch
          46 kB
          Henri Biestro
        2. solr-646.patch
          71 kB
          Henri Biestro
        3. solr-646.patch
          69 kB
          Henri Biestro
        4. SOLR-646.patch
          21 kB
          Shalin Shekhar Mangar
        5. solr-646.patch
          90 kB
          Henri Biestro
        6. solr-646.patch
          85 kB
          Henri Biestro
        7. solr-646.patch
          84 kB
          Henri Biestro
        8. solr-646.patch
          78 kB
          Henri Biestro
        9. solr-646.patch
          55 kB
          Henri Biestro

        Issue Links

          Activity

            People

              Unassigned Unassigned
              henrib Henri Biestro
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: