Solr
  1. Solr
  2. SOLR-646

Configuration properties enhancements in solr.xml

    Details

    • Type: New Feature New Feature
    • Status: In Progress
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4
    • Fix Version/s: 4.9, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      This patch refers to 'generalized configuration properties' as specified by HossMan
      This means configuration & schema files can use expression based on properties defined in solr.xml.

      Use cases:

      Describe core data directories from solr.xml as properties.
      Share the same schema and/or config file between multiple cores.
      Share reusable fragments of schema & configuration between multiple cores.

      Usage:

      solr.xml

      This solr.xml will be used to illustrates using properties for different purpose.

      <solr persistent="true">
        <property name="version" value="1.3"/>
        <property name="lang" value="english, french"/>
        <property name="en-cores" value="en,core0"/>
        <property name="fr-cores" value="fr,core1"/>
        <!-- This experimental feature flag enables schema & solrconfig to include other files --> 
        <property name="solr.experimental.enableConfigInclude" value="true"/>
        <cores adminPath="/admin/cores">
          <core name="${en-cores}" instanceDir="./">
      	  <property name="version" value="3.5"/>
      	  <property name="l10n" value="EN"/>
      	  <property name="ctlField" value="core0"/>
      	  <property name="comment" value="This is a sample"/>
      	</core>
          <core name="${fr-cores}" instanceDir="./">
      	  <property name="version" value="2.4"/>
      	  <property name="l10n" value="FR"/>
      	  <property name="ctlField" value="core1"/>
      	  <property name="comment" value="Ceci est un exemple"/>
      	</core>
        </cores>
      </solr>
      

      version : if you update your solr.xml or your cores for various motives, it can be useful to track of a version. In this example, this will be used to define the dataDir for each core.
      en-cores,fr-cores: with aliases, if the list is long or repetitive, it might be convenient to use a property that can then be used to describe the Solr core name.
      instanceDir: note that both cores will use the same instance directory, sharing their configuration and schema. The dataDir will be set for each of them from the solrconfig.xml.

      solrconfig.xml

      This is where our solr.xml property are used to define the data directory as a composition of, in our example, the language code l10n and the core version stored in version.

      <config>
        <dataDir>${solr.solr.home}/data/${l10n}-${version}</dataDir>
      ....
      </config>
      
      schema.xml

      The include allows to import a file within the schema (or a solrconfig); this can help de-clutter long schemas or reuse parts.
      The ctlField is just illustrating that a field & its type can be set through properties as well; in our example, we will want the 'english' core to refer to an 'english-configured' field and the 'french' core to a 'french-configured' one. The type for the field is defined as text-EN or text-FR after expansion.

      <schema name="example core ${l10n}" version="1.1">
        <types>
      ...
         <include resource="text-l10n.xml"/>
        </types>
      
       <fields>   
      ...
        <field name="${ctlField}"   type="text-${l10n}"   indexed="true"  stored="true"  multiValued="true" /> 
       </fields>
      

      This schema is importing this text-l10n.xml file which is a fragment; the fragment tag must be present & indicates the file is to be included. Our example only defines different stopwords for each language but you could of course extend this to stemmers, synonyms, etc.

      <fragment>
      	<fieldType name="text-FR" class="solr.TextField" positionIncrementGap="100">
      ...
      	    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-fr.txt"/>
      ...
      	</fieldType>
      	<fieldType name="text-EN" class="solr.TextField" positionIncrementGap="100">
      ...
      	    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-en.txt"/>
      ...
      	</fieldType>
      </fragment>
      

      Alternatively, one can use XML entities using the 'solr:' protocol to the same end as in:

      <!DOCTYPE schema [
      <!ENTITY textL10n SYSTEM "solr:${l10ntypes}">
      ]>
      <schema name="example core ${l10n}" version="1.1">
        <types>
         <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
         <!--include resource="text-l10n.xml"/-->
         &textL10n;
        </types>
        ...
      </schema>
      

      Technical specifications

      solr.xml can define properties at the multicore & each core level.
      Properties defined in the multicore scope can override system properties.
      Properties defined in a core scope can override multicore & system properties.
      Property definitions can use expressions to define their name & value; these expressions are evaluated in their outer scope context .
      CoreContainer serialization keeps properties as defined; persistence is idem-potent. (ie property expressions are written, not their evaluation).

      The core descriptor properties are automatically defined in each core context, namely:
      solr.core.instanceDir
      solr.core.name
      solr.core.configName
      solr.core.schemaName

      Coding notes:

      • DOMUtil.java:
        cosmetic changes
        toMapExcept systematically skips 'xml:base" attributes (which may come from entity resolving)
      • CoreDescriptor.java:
        The core descriptor does not store properties as values but as expressions (and all its members can be property expressions as well) allowing to write file as defined (not as evaluated)
        The public getCoreProperties is removed for that reason. (too bad we were in such a rush...)
      • CoreContainer.java:
        changes related to extracting the core names before they are evaluated in load()
        changes related to evaluating core descriptor member before adding them to the core's loader properties
        fix in persistFile which was not interpreting relative pathes correctly
        fix in persist because properties were not written at the right place
        changes in persist to write expressions (and core name when it is one)
      • Config.java:
        subsituteProperties has been moved out of constructor so calls must be explicit.
        added the entity resolver
        added subsituteIncludes which processes <include name.../>
      • SolrConfig.java & IndexSchema.java
        added explicit calls to substituteIncludesto perform property/include expansion
      • SolrResourceLoader.java
        cosmetic, changed getCoreProperties to getProperties (since they may come from the CoreContainer)
      • SolrProperties.java:
        schema uses a localization (l10n) property to define an attribute
        persists the file to check it keeps the expression properties
      • QueryElevationComponent.java
        Needed to explicitly call substituteProperties.
      1. solr-646.patch
        46 kB
        Henri Biestro
      2. solr-646.patch
        71 kB
        Henri Biestro
      3. solr-646.patch
        69 kB
        Henri Biestro
      4. solr-646.patch
        90 kB
        Henri Biestro
      5. solr-646.patch
        85 kB
        Henri Biestro
      6. solr-646.patch
        84 kB
        Henri Biestro
      7. solr-646.patch
        78 kB
        Henri Biestro
      8. solr-646.patch
        55 kB
        Henri Biestro
      9. SOLR-646.patch
        21 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Henri Biestro added a comment -

          supersedes the version in solr-350 with some improvements & bug fixes.
          Multicore will attempt to load the bundle "multicore" (aka tries to load a multicore.properties) allowing to define installation wide properties.
          Unrelated but convenient to test/verify persistence, solrj now allows to persist the file with an explicit name.

          Show
          Henri Biestro added a comment - supersedes the version in solr-350 with some improvements & bug fixes. Multicore will attempt to load the bundle "multicore" (aka tries to load a multicore.properties) allowing to define installation wide properties. Unrelated but convenient to test/verify persistence, solrj now allows to persist the file with an explicit name.
          Hide
          Noble Paul added a comment -

          Can we see a use case for this kind of configuration?

          Show
          Noble Paul added a comment - Can we see a use case for this kind of configuration?
          Hide
          Henri Biestro added a comment -

          Added some use cases. Thanks Paul.

          Show
          Henri Biestro added a comment - Added some use cases. Thanks Paul.
          Hide
          Noble Paul added a comment -

          how is this evaluated? The syntax is not obvious as to what it is doing. I am referring to the type attribute in the following snippet

           <field name="id"   type="${id_type:id_t}"   indexed="true"  stored="true"  multiValued="false" required="true"/>
          

          I can't imagine a usecase where field types are dynamically assigned

          Show
          Noble Paul added a comment - how is this evaluated? The syntax is not obvious as to what it is doing. I am referring to the type attribute in the following snippet <field name= "id" type= "${id_type:id_t}" indexed= " true " stored= " true " multiValued= " false " required= " true " /> I can't imagine a usecase where field types are dynamically assigned
          Hide
          Henri Biestro added a comment -

          ...type='$

          {id_type:id_t}

          '... : evaluate the property 'id_type', if no value found default to 'id_t''.
          Let's consider you want to have the same schema (at the field level) but in different cores for different languages; you can dynamically set a field type based on a locale (stemming/stopwords).
          I agree that using the id field in the example is misleading.

          Show
          Henri Biestro added a comment - ...type='$ {id_type:id_t} '... : evaluate the property 'id_type', if no value found default to 'id_t''. Let's consider you want to have the same schema (at the field level) but in different cores for different languages; you can dynamically set a field type based on a locale (stemming/stopwords). I agree that using the id field in the example is misleading.
          Hide
          Henri Biestro added a comment -

          I'll update soon the patch to add an 'import' feature (allowing to import chunks of xml) in config & schema as in <import resource='stuff.xml'/> or even <import resource='$

          {my.types}

          '/>.

          Configuration not being a high-priority feature in the Solr community, let me know if anyone is interested so I put more efforts in documenting this & the follow up.

          Show
          Henri Biestro added a comment - I'll update soon the patch to add an 'import' feature (allowing to import chunks of xml) in config & schema as in <import resource='stuff.xml'/> or even <import resource='$ {my.types} '/>. Configuration not being a high-priority feature in the Solr community, let me know if anyone is interested so I put more efforts in documenting this & the follow up.
          Hide
          Shalin Shekhar Mangar added a comment -

          I haven't looked into the patch but just fyi, there's a VariableResolver, VariableResolverImpl and TemplateString classes in DataImportHandler (SOLR-469) that also support general variable substitution. However that code does not support specifying defaults the way DOMUtil does.

          Show
          Shalin Shekhar Mangar added a comment - I haven't looked into the patch but just fyi, there's a VariableResolver, VariableResolverImpl and TemplateString classes in DataImportHandler ( SOLR-469 ) that also support general variable substitution. However that code does not support specifying defaults the way DOMUtil does.
          Hide
          Henri Biestro added a comment -

          Besides the <property .../> syntax, the latest patch version adds the <import resource='resource'/> (where resource can be a constant or a variable).
          This means config and schemas can be chunked in multiple pieces that can be reused in different cores. Combined with properties, this completely frees up core configuration.
          Also added a <fragment>...</fragment> tag to allow importing XML chunks that don't have a natural root node.

          The PropertyMap.imports() allows to enumerate which expressions were resolved as resources locally (aka config, schema, import) and their logical location when implicit. (Map.Entry<String,String[]> e; e.getValue()[0] is the canonical resource value, getValue()[1] the 'symbolic' one. When value[0] == value[1], it means this was absolute or found in a JAR.

          Shalin, thanks, good to know; I've only been using/refactoring DOMUtil since this patch mainly deals with property expansion (besides definition & book-keeping).

          Show
          Henri Biestro added a comment - Besides the <property .../> syntax, the latest patch version adds the <import resource='resource'/> (where resource can be a constant or a variable). This means config and schemas can be chunked in multiple pieces that can be reused in different cores. Combined with properties, this completely frees up core configuration. Also added a <fragment>...</fragment> tag to allow importing XML chunks that don't have a natural root node. The PropertyMap.imports() allows to enumerate which expressions were resolved as resources locally (aka config, schema, import) and their logical location when implicit. (Map.Entry<String,String[]> e; e.getValue() [0] is the canonical resource value, getValue() [1] the 'symbolic' one. When value [0] == value [1] , it means this was absolute or found in a JAR. Shalin, thanks, good to know; I've only been using/refactoring DOMUtil since this patch mainly deals with property expansion (besides definition & book-keeping).
          Hide
          Hoss Man added a comment -

          i haven't looked at the patch, but i'm +1 on the write up and the description of how things would/should work ... the only thing that jumps out at me is the automatic properties...

          solr.core.instanceDir
          solr.core.instancePath
          solr.core.name
          solr.core.configName
          solr.core.schemaName

          ...first off, i'm not sure what the difference between instanceDir and instancePath are or what configName means (but that might just be my general out of touch-ness with all the multicore advancements, in which case don't worry about it.) My main questions:

          • Are the values of these automatic properties invariant, or can the be overridden?
          • Should we use some special prefix/syntax for automatic properties such that we reduce/eliminate the risk of collision if we add new ones in future versions? (example: the automatic properties could all have names starting with "_", and the code that reads property declarations could ignore/warn if a user attempts to declare a property that (might) collide in the future.
          Show
          Hoss Man added a comment - i haven't looked at the patch, but i'm +1 on the write up and the description of how things would/should work ... the only thing that jumps out at me is the automatic properties... solr.core.instanceDir solr.core.instancePath solr.core.name solr.core.configName solr.core.schemaName ...first off, i'm not sure what the difference between instanceDir and instancePath are or what configName means (but that might just be my general out of touch-ness with all the multicore advancements, in which case don't worry about it.) My main questions: Are the values of these automatic properties invariant, or can the be overridden? Should we use some special prefix/syntax for automatic properties such that we reduce/eliminate the risk of collision if we add new ones in future versions? (example: the automatic properties could all have names starting with "_", and the code that reads property declarations could ignore/warn if a user attempts to declare a property that (might) collide in the future.
          Hide
          Lance Norskog added a comment -

          How will these files and substituted properties show up in the admin JSPs?

          Show
          Lance Norskog added a comment - How will these files and substituted properties show up in the admin JSPs?
          Hide
          Henri Biestro added a comment -

          About documentation:
          I'll try to come with better (ie more meaningful) examples - although I kinda feel "impaired" to describe the 3 tags (property/import/fragment) if what I've already written does not convey 90% of how things would/should work already. I welcome any help/hint on what a useful documentation should look like (any template I could/should follow)?
          On the documentation editing/releasing process, I'd tend to go for refining this issue description till the code gets committed at which time I'd copy/paste it in a Wiki page if that's OK.

          About "automatic" properties:
          Those reflect the Core descriptor values or in more general terms, any Solr install/multicore/core configuration/schema value used in the system but defined through another API; the core name, instance directory, configuration file name (defaults to solrconfig.xml), schema file name (defaults to schema.xml) all come form the core descriptor .
          solr.core.instanceDir is strictly what the core descriptor instance directory is; solr.core.instancePath is the non-relative version of it. When solr.core.instanceDir is not an absolute path, is is relative to the solr.home instance directory; in this case, solr.core.instancePath is the non relative version. Otherwise, they are just the same; overall, this just reflect what the code does but removes any ambiguous implicits (which is what the PropertyMap.imports() also resolves*).

          I've also added the solr.core.schemaDir & solr.core.configDir- the directories where the config & schema files reside. We have no way yet to specify those last two but they are inferred from the other values - which would be their defaults if un-specified.There is a functional choice to make about how we want to complete the "core" properties; the "missing" parts are the config & schema directories. My preference would be to extend the core descriptor and reflect those as properties but we might want&try to go the other way, put everything as properties & retain the API as shortcuts.

          Regarding naming conventions/protection, I would tend to go for the natural attribute-like convention & protect those; the automatic/reserved namespaces would be any property starting with 'solr.' (including 'solr.core' and 'solr.multicore') and disallow/warn about any attempt to override. Would that be acceptable/enough ?

          About files/substituted properties & admin JSPs:
          As is, nothing has changed; getting the schema/config return the file as it exists in the file-system. The first natural extension would be that for config & schema, when allowed, the imported files should be downloadable too. It might be nice to have the download expanded version of each file for debugging purpose as well if this is actually useful and/or the expanded version of config & schema; any functional suggestions is welcome. Lance, what do you think would be good to have?

          About PropertyMap.imports():
          The rationale behind this besides debugging/logging purpose would be "dynamic" replication of a core (a slave Slr 'install' could automatically replicate new "master' cores. The goal of this API is to have the canonical (local) path of each file - so we can read them- and their logical/symbolic counterparts so their destination (remote) path can be determined, each core's solrconfig.xml itself being "replicable".

          About the issue itself:
          How much of these should be specified/implemented before the patch can be committed and/or is there any chance to put this in the 1.3 scope ?

          Show
          Henri Biestro added a comment - About documentation: I'll try to come with better (ie more meaningful) examples - although I kinda feel "impaired" to describe the 3 tags (property/import/fragment) if what I've already written does not convey 90% of how things would/should work already. I welcome any help/hint on what a useful documentation should look like (any template I could/should follow)? On the documentation editing/releasing process, I'd tend to go for refining this issue description till the code gets committed at which time I'd copy/paste it in a Wiki page if that's OK. About "automatic" properties: Those reflect the Core descriptor values or in more general terms, any Solr install/multicore/core configuration/schema value used in the system but defined through another API; the core name, instance directory, configuration file name (defaults to solrconfig.xml), schema file name (defaults to schema.xml) all come form the core descriptor . solr.core.instanceDir is strictly what the core descriptor instance directory is; solr.core.instancePath is the non-relative version of it. When solr.core.instanceDir is not an absolute path, is is relative to the solr.home instance directory; in this case, solr.core.instancePath is the non relative version. Otherwise, they are just the same; overall, this just reflect what the code does but removes any ambiguous implicits (which is what the PropertyMap.imports() also resolves*). I've also added the solr.core.schemaDir & solr.core.configDir- the directories where the config & schema files reside. We have no way yet to specify those last two but they are inferred from the other values - which would be their defaults if un-specified.There is a functional choice to make about how we want to complete the "core" properties; the "missing" parts are the config & schema directories. My preference would be to extend the core descriptor and reflect those as properties but we might want&try to go the other way, put everything as properties & retain the API as shortcuts. Regarding naming conventions/protection, I would tend to go for the natural attribute-like convention & protect those; the automatic/reserved namespaces would be any property starting with 'solr.' (including 'solr.core' and 'solr.multicore') and disallow/warn about any attempt to override. Would that be acceptable/enough ? About files/substituted properties & admin JSPs: As is, nothing has changed; getting the schema/config return the file as it exists in the file-system. The first natural extension would be that for config & schema, when allowed, the imported files should be downloadable too. It might be nice to have the download expanded version of each file for debugging purpose as well if this is actually useful and/or the expanded version of config & schema; any functional suggestions is welcome. Lance, what do you think would be good to have? About PropertyMap.imports(): The rationale behind this besides debugging/logging purpose would be "dynamic" replication of a core (a slave Slr 'install' could automatically replicate new "master' cores. The goal of this API is to have the canonical (local) path of each file - so we can read them- and their logical/symbolic counterparts so their destination (remote) path can be determined, each core's solrconfig.xml itself being "replicable". About the issue itself: How much of these should be specified/implemented before the patch can be committed and/or is there any chance to put this in the 1.3 scope ?
          Hide
          Jeremy Hinegardner added a comment -

          There seems to be some issues with this patch, I would like to use it, but it doesn't apply cleanly to svn head, patch seems to mess up in the first hunk of Config.java. I manually applied that portion, and now I'm recieveing an the error:

          [javac] /Users/jeremy/repos/svn/solr/apache-solr-nightly/src/java/org/apache/solr/core/MultiCore.java:106: cannot find symbol
          [javac] symbol  : method keySet()
          [javac] location: class java.util.ResourceBundle
          [javac]         Iterator<String> ikeys = bundle.keySet().iterator();
          

          My java is rusty, but this appears to be a Java 6 only call on Resource Bundle. Is solr 1.3 going to be Java 6 only ?

          Show
          Jeremy Hinegardner added a comment - There seems to be some issues with this patch, I would like to use it, but it doesn't apply cleanly to svn head, patch seems to mess up in the first hunk of Config.java. I manually applied that portion, and now I'm recieveing an the error: [javac] /Users/jeremy/repos/svn/solr/apache-solr-nightly/src/java/org/apache/solr/core/MultiCore.java:106: cannot find symbol [javac] symbol : method keySet() [javac] location: class java.util.ResourceBundle [javac] Iterator<String> ikeys = bundle.keySet().iterator(); My java is rusty, but this appears to be a Java 6 only call on Resource Bundle. Is solr 1.3 going to be Java 6 only ?
          Hide
          Ryan McKinley added a comment -

          yes, keySet() is 1.6 only....

          Solr is 1.5, so this will need to get changed before going in...

          Show
          Ryan McKinley added a comment - yes, keySet() is 1.6 only.... Solr is 1.5, so this will need to get changed before going in...
          Hide
          Ryan McKinley added a comment -

          About the issue itself:
          How much of these should be specified/implemented before the patch can be committed and/or is there any chance to put this in the 1.3 scope ?

          Yes, I think this should be in 1.3 – it makes MultiCore dramatically more useful...

          Show
          Ryan McKinley added a comment - About the issue itself: How much of these should be specified/implemented before the patch can be committed and/or is there any chance to put this in the 1.3 scope ? Yes, I think this should be in 1.3 – it makes MultiCore dramatically more useful...
          Hide
          Raghu kashyap added a comment -

          +1 for it being in 1.3

          We are using multicore and eagerly waiting for 1.3

          Show
          Raghu kashyap added a comment - +1 for it being in 1.3 We are using multicore and eagerly waiting for 1.3
          Hide
          Henri Biestro added a comment -

          Fixed dependency on Java 6; my bad, sorry Jeremy.

          Solved 2 questions pending from the previous patch:

          1 - Added schemaDir & configDir to solr core descriptors; this means all 'automatic' properties come as representations of values coming from the core descriptor. I removed the solr.core.instancePath and homogenized the solr.core.

          {instance,schema,config}

          Dir properties to be expanded relative to the multicore instance dir if not absolute.
          2 - Added some checking/warning if some user defined properties collide with automatic ones during core creation; the system computed value is the one kept. The current list of automated properties (& example values):
          solr.core.configDir = ..\..\..\example\multicore\core0\conf/
          solr.core.configName = solrconfig.xml
          solr.core.name = core0
          solr.core.instanceDir = ..\..\..\example\multicore\core0/
          solr.core.schemaName = schema.xml
          solr.core.schemaDir = ..\..\..\example\multicore\core0\conf/

          I'll udpate the issue description to reflect the current choices.

          New (sub) issue/choice to make:
          There is no way yet to specify properties through solrj when creating a core (nor a way to specify the core

          {config, schema} {Dir, Name}

          ). Any preferred syntax? property.<name_of_property0>=<value_of_property0>&property.<name_of_property1>=<value_of_property1> is my current idea to pass them in a http request but this is heavy...

          Show
          Henri Biestro added a comment - Fixed dependency on Java 6; my bad, sorry Jeremy. Solved 2 questions pending from the previous patch: 1 - Added schemaDir & configDir to solr core descriptors; this means all 'automatic' properties come as representations of values coming from the core descriptor. I removed the solr.core.instancePath and homogenized the solr.core. {instance,schema,config} Dir properties to be expanded relative to the multicore instance dir if not absolute. 2 - Added some checking/warning if some user defined properties collide with automatic ones during core creation; the system computed value is the one kept. The current list of automated properties (& example values): solr.core.configDir = ..\..\..\example\multicore\core0\conf/ solr.core.configName = solrconfig.xml solr.core.name = core0 solr.core.instanceDir = ..\..\..\example\multicore\core0/ solr.core.schemaName = schema.xml solr.core.schemaDir = ..\..\..\example\multicore\core0\conf/ I'll udpate the issue description to reflect the current choices. New (sub) issue/choice to make: There is no way yet to specify properties through solrj when creating a core (nor a way to specify the core {config, schema} {Dir, Name} ). Any preferred syntax? property.<name_of_property0>=<value_of_property0>&property.<name_of_property1>=<value_of_property1> is my current idea to pass them in a http request but this is heavy...
          Hide
          Ryan McKinley added a comment -

          i'm having trouble applying this patch... any chance you could generate an updated one?

          Show
          Ryan McKinley added a comment - i'm having trouble applying this patch... any chance you could generate an updated one?
          Hide
          Henri Biestro added a comment -

          updated to trunk 684310

          Show
          Henri Biestro added a comment - updated to trunk 684310
          Hide
          Hoss Man added a comment -

          I started digging into this patch today, making notes for myself as i go along, and revising them as i understand more and more about the code. i still don't feel like i fully grasp a lot of this patch, but i have to go offline pretty soon, and i'm not sure i'll get a chance to dig into this again for a few days, so i wanted to post my rough notes so far. (i've done some minimal cleanup to make them wiki formatted, but they're still be in my convoluted stream of consciousness rambling)

          NOTE: the patch doesn't apply tothe trunk anymore (hey look at that, MultiCore.java is gone) so these are based on applying the patch to r684307.


          • are there any unit tests specificly for this new stuff? At a minimum the existing test configs should excercies this functionality so we have some confidence that it works.
          • the changes to the example configs seem overly contrived:
            • excessivly using includes
            • property for type name in both the fieldtype and field (which
              makes it irrelevant what the actual value is – even if the property
              isn't evaluated the literal string "$ {id_type:id_t}

              " is a valid
              fieldtype name)

          • we should reduce the usages in the example configs, and make them represent slightly more practical usecases...
            • <import> to include different postCommit fragements depending
              on master/slave property.
            • pick dataDir based on corename
            • <import> to add some field declarations to schema (not all)
            • property to override which fieldtype name a field should use (leave the <fieldtype> declaration static)
          • <fragement> should be used more consistently. Even if the file being included already has a root node (and is wellformed) the <import> tag should inforce that the imported file be a valid fragement file so we know the file really was ment to be inlcuded.
          • <import> seems like it should be named <include>
          • I really don't understand ParamMap.imports(). for starters the javadocs seem like cut/paste mistake. Beyond that: if I'm understanding the Jira comments, the String[] represents a "pair" of bits of info about a resource the first being the relative path as specified in the <import> tag, the second being the directory it was found in (or maybe the absolute path??) ... so what then is the key to the Map<String,String[]> ... what is the "symbolic property" ???? ... going deeper down the rabit hole, why do we need new public methods in SolrResourceLoader? what do they do that the existing methos didn't take care of? (they have no javadocs)
          • is the new "private InputStream open(String,String[])" method in SolrResourceLoader suppose to be modifying the array that is passed to it? (it's not a documented side effect) ... even after readingteh docs for this method, and all 10 lines of code in the method, i'm still not sure i understand what it's doing or why (what does "convenient to export" and "convenient to import" mean?)
          • Do we really need PropertyMap, Evaluator, and all of hte public methods they have to be public? On the whole it seems like a lot of new plumbing is being exposed here – and we already have more exposed plumbing then we probably should.
            It seems like the only API most code should need to worry about should be something like...
               /**
                * Given a DOM Document, and some properties identifing the current
                * context, generates a new DOM Document when imports and property
                * value substitutions performed.
                * @param doc The Document to evaluate
                * @param context existing variables that should be considered when
                *                processing doc
                * @param loader for resolving filenames
                */
                public static dom.Document evaluateProps(dom.Document doc,
                                                         Properties context,
            					     SolrResourceLoader loader);
            

            ...and even that could probably be encapsulated and hidden inside of the Config class and called as part of it's constructor (with a new Properties or SolrParams constructor arg to specify the outer "context")

          • there seems to be an awful lot of code scattered arround checking if "x != PropertyMap.SYSTEM_PROPERTIES" and doing something special in that case ... whatever the concerns are about this, it should probably be handled by the impl of PropertyMap.SYSTEM_PROPERTIES (even if that means it needs to be a special subclass of PropertyMap) so we don't have to worry about new code being added without remember to be as paranoid.
          • this is really unclear, even with the coment ...
                   // if this is a fragment, isfrag becomes true (thus the '=', *not* '==')
                   if (isfrag = "fragment".equals(walk.getNodeName()))
            

            ... we can replace those two lines with these two lines and it's even clearer...

                   isfrag = "fragment".equals(walk.getNodeName())
                   if (isfrag)
             
          • Should we be worried about removing some public static methods from DOMUtil? It would probably be better to just stop using them, and mark them deprecated (no reason to risk breaking someone who decided to use them, even if the odds are low that anyone is).

          On the whole: there is more complexity here then I would expect for the goal we are trying to accomplish, and it's not clear to me that it's necessary .... i haven't had the "Ahhhh, i see now" moment that makes me understand why the new code jumps through as many hoops as it does.

          Show
          Hoss Man added a comment - I started digging into this patch today, making notes for myself as i go along, and revising them as i understand more and more about the code. i still don't feel like i fully grasp a lot of this patch, but i have to go offline pretty soon, and i'm not sure i'll get a chance to dig into this again for a few days, so i wanted to post my rough notes so far. (i've done some minimal cleanup to make them wiki formatted, but they're still be in my convoluted stream of consciousness rambling) NOTE: the patch doesn't apply tothe trunk anymore (hey look at that, MultiCore.java is gone) so these are based on applying the patch to r684307. are there any unit tests specificly for this new stuff? At a minimum the existing test configs should excercies this functionality so we have some confidence that it works. the changes to the example configs seem overly contrived: excessivly using includes property for type name in both the fieldtype and field (which makes it irrelevant what the actual value is – even if the property isn't evaluated the literal string "$ {id_type:id_t} " is a valid fieldtype name) we should reduce the usages in the example configs, and make them represent slightly more practical usecases... <import> to include different postCommit fragements depending on master/slave property. pick dataDir based on corename <import> to add some field declarations to schema (not all) property to override which fieldtype name a field should use (leave the <fieldtype> declaration static) <fragement> should be used more consistently. Even if the file being included already has a root node (and is wellformed) the <import> tag should inforce that the imported file be a valid fragement file so we know the file really was ment to be inlcuded. <import> seems like it should be named <include> I really don't understand ParamMap.imports(). for starters the javadocs seem like cut/paste mistake. Beyond that: if I'm understanding the Jira comments, the String[] represents a "pair" of bits of info about a resource the first being the relative path as specified in the <import> tag, the second being the directory it was found in (or maybe the absolute path??) ... so what then is the key to the Map<String,String[]> ... what is the "symbolic property" ???? ... going deeper down the rabit hole, why do we need new public methods in SolrResourceLoader? what do they do that the existing methos didn't take care of? (they have no javadocs) is the new "private InputStream open(String,String[])" method in SolrResourceLoader suppose to be modifying the array that is passed to it? (it's not a documented side effect) ... even after readingteh docs for this method, and all 10 lines of code in the method, i'm still not sure i understand what it's doing or why (what does "convenient to export" and "convenient to import" mean?) Do we really need PropertyMap, Evaluator, and all of hte public methods they have to be public? On the whole it seems like a lot of new plumbing is being exposed here – and we already have more exposed plumbing then we probably should. It seems like the only API most code should need to worry about should be something like... /** * Given a DOM Document, and some properties identifing the current * context, generates a new DOM Document when imports and property * value substitutions performed. * @param doc The Document to evaluate * @param context existing variables that should be considered when * processing doc * @param loader for resolving filenames */ public static dom.Document evaluateProps(dom.Document doc, Properties context, SolrResourceLoader loader); ...and even that could probably be encapsulated and hidden inside of the Config class and called as part of it's constructor (with a new Properties or SolrParams constructor arg to specify the outer "context") there seems to be an awful lot of code scattered arround checking if "x != PropertyMap.SYSTEM_PROPERTIES" and doing something special in that case ... whatever the concerns are about this, it should probably be handled by the impl of PropertyMap.SYSTEM_PROPERTIES (even if that means it needs to be a special subclass of PropertyMap) so we don't have to worry about new code being added without remember to be as paranoid. this is really unclear, even with the coment ... // if this is a fragment, isfrag becomes true (thus the '=', *not* '==') if (isfrag = "fragment" .equals(walk.getNodeName())) ... we can replace those two lines with these two lines and it's even clearer... isfrag = "fragment" .equals(walk.getNodeName()) if (isfrag) Should we be worried about removing some public static methods from DOMUtil? It would probably be better to just stop using them, and mark them deprecated (no reason to risk breaking someone who decided to use them, even if the odds are low that anyone is). On the whole: there is more complexity here then I would expect for the goal we are trying to accomplish, and it's not clear to me that it's necessary .... i haven't had the "Ahhhh, i see now" moment that makes me understand why the new code jumps through as many hoops as it does.
          Hide
          Henri Biestro added a comment - - edited

          Thanks HossMan for the review;
          I've overhauled the code following your comments (more focused, less code, less public) & I'm writing more tests.

          One important question: do we need or even want the "include" feature or should I drop it (at least for now)?

          Should be able to upload a "take-2" in a few days.

          Show
          Henri Biestro added a comment - - edited Thanks HossMan for the review; I've overhauled the code following your comments (more focused, less code, less public) & I'm writing more tests. One important question: do we need or even want the "include" feature or should I drop it (at least for now)? Should be able to upload a "take-2" in a few days.
          Hide
          Ryan McKinley added a comment -

          do we need or even want the "include" feature or should I drop it (at least for now)?

          drop it... for 1.3 this should focus on a coherent way to set the dataDir for multicore configs. Anythign else can wait till later...

          Show
          Ryan McKinley added a comment - do we need or even want the "include" feature or should I drop it (at least for now)? drop it... for 1.3 this should focus on a coherent way to set the dataDir for multicore configs. Anythign else can wait till later...
          Hide
          Henri Biestro added a comment -

          Refactored following HossMan recommendations.
          This still includes the "include" (was import) feature but removing it is easy (in Config.java) & will reduce modifications in ResourceLoader.java.
          Added a specific test that loads a shared schema & config but uses properties to differentiate behavior (different locales, different dataDirs).
          Includes a one letter fix in CoreContainer.load for aliases (get(a) instead of get).
          More tests will follow (persist verification is "manual") & I'll try to cut the patch in 2 (properties - include).
          I might be busier in the coming days but will try not to slow down 1.3 release if next review says this is on the right track.

          Show
          Henri Biestro added a comment - Refactored following HossMan recommendations. This still includes the "include" (was import) feature but removing it is easy (in Config.java) & will reduce modifications in ResourceLoader.java. Added a specific test that loads a shared schema & config but uses properties to differentiate behavior (different locales, different dataDirs). Includes a one letter fix in CoreContainer.load for aliases (get(a) instead of get ). More tests will follow (persist verification is "manual") & I'll try to cut the patch in 2 (properties - include). I might be busier in the coming days but will try not to slow down 1.3 release if next review says this is on the right track.
          Hide
          Ryan McKinley added a comment -

          I am leaving on vacation... and will be unable to give this any attention, so i have removed myself as the "Assignee"

          Show
          Ryan McKinley added a comment - I am leaving on vacation... and will be unable to give this any attention, so i have removed myself as the "Assignee"
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Here's another attempt on this issue.

          1. Four types of properties – core specific, implicit, global and system (evaluated in that order)
          2. Syntax for properties in solr.xml is modified slightly to use attributes instead of element values
            <solr persistent="false">
            
              <property name="var" value="value" />
            
              <cores adminPath="/admin/cores">
                <core name="core0" instanceDir="core0">
                  <property name="var" value="value" />
                </core>
                <core name="core1" instanceDir="core1" />
              </cores>
            </solr>
            
          3. Supported automatic or implicit properties are:
            solr.core.instanceDir
            solr.core.name
            solr.core.configName
            solr.core.schemaName
            
          4. SolrResourceLoader keeps a reference to the CoreDescriptor to get access to the core specific properties.
          5. PropertyReplacer is heavily borrowed from TemplateString in DataImportHandler. It is trimmed down to our needs and enhanced to support default values. TestPropertyReplacer is a unit test for this class. PropertyReplacer and TemplateString can be merged together in the future but at this stage I don't want us to have any regressions.
          6. It doesn't support escaping variables like the previous DOMUtil code e.g. $$ {literal} gives you $${literal}

            but it used to give $

            {literal}

            . I don't think we need that capability anyway.

          7. No support for fragment and imports – I'm strongly against it. I don't think there's anything which we can't do without them and they make for poor readability.

          Review comments and suggestions are invited. As soon as we are done with this issue we can start the release process for 1.3

          Show
          Shalin Shekhar Mangar added a comment - - edited Here's another attempt on this issue. Four types of properties – core specific, implicit, global and system (evaluated in that order) Syntax for properties in solr.xml is modified slightly to use attributes instead of element values <solr persistent= "false" > <property name= "var" value= "value" /> <cores adminPath= "/admin/cores" > <core name= "core0" instanceDir= "core0" > <property name= "var" value= "value" /> </core> <core name= "core1" instanceDir= "core1" /> </cores> </solr> Supported automatic or implicit properties are: solr.core.instanceDir solr.core.name solr.core.configName solr.core.schemaName SolrResourceLoader keeps a reference to the CoreDescriptor to get access to the core specific properties. PropertyReplacer is heavily borrowed from TemplateString in DataImportHandler. It is trimmed down to our needs and enhanced to support default values. TestPropertyReplacer is a unit test for this class. PropertyReplacer and TemplateString can be merged together in the future but at this stage I don't want us to have any regressions. It doesn't support escaping variables like the previous DOMUtil code e.g. $$ {literal} gives you $${literal} but it used to give $ {literal} . I don't think we need that capability anyway. No support for fragment and imports – I'm strongly against it. I don't think there's anything which we can't do without them and they make for poor readability. Review comments and suggestions are invited. As soon as we are done with this issue we can start the release process for 1.3
          Hide
          Henri Biestro added a comment -

          Not sure this is my place to comment since I'm no committer but I don't see what this brings that the previous patch did not (and do see that 'import' or sharing schema/config is not a feature that should be discussed)..
          It would have been helpful to me to understand why the updated patch is not even considered.
          All yours then.

          Show
          Henri Biestro added a comment - Not sure this is my place to comment since I'm no committer but I don't see what this brings that the previous patch did not (and do see that 'import' or sharing schema/config is not a feature that should be discussed).. It would have been helpful to me to understand why the updated patch is not even considered. All yours then.
          Hide
          Shalin Shekhar Mangar added a comment -

          Henri – You have been doing all the work on this so you are the best person to comment on it

          The last patch was not applying to the trunk. It had a class called SolrProperties in the test sources which looked like a copy of MultiCoreExampleTestBase. It also looked like the last patch had some remnants from other multicore related patches. It was getting tough to get my head around the whole code path with Raw, Evaluator and Includes and I just tried to see if we can have a more simple way of solving the core problem i.e. property substitution in solconfig and schema to help with multi core installations.

          The patch I uploaded has no support for multicore.properties – I probably overlooked it. How does this property file help as compared to specifying properties in solr.xml directly?

          I do not want to summarily dismiss 'import' or any other feature. Please feel free to help me with use-cases. I admit that I've not used multicore setup in multi-language environments so I may not have the insight that you have. In particular, I do not understand the use-case "Change the general layout between data, config & schema directories". The latest patch aims to fulfill the other two use-cases.

          Show
          Shalin Shekhar Mangar added a comment - Henri – You have been doing all the work on this so you are the best person to comment on it The last patch was not applying to the trunk. It had a class called SolrProperties in the test sources which looked like a copy of MultiCoreExampleTestBase. It also looked like the last patch had some remnants from other multicore related patches. It was getting tough to get my head around the whole code path with Raw, Evaluator and Includes and I just tried to see if we can have a more simple way of solving the core problem i.e. property substitution in solconfig and schema to help with multi core installations. The patch I uploaded has no support for multicore.properties – I probably overlooked it. How does this property file help as compared to specifying properties in solr.xml directly? I do not want to summarily dismiss 'import' or any other feature. Please feel free to help me with use-cases. I admit that I've not used multicore setup in multi-language environments so I may not have the insight that you have. In particular, I do not understand the use-case "Change the general layout between data, config & schema directories". The latest patch aims to fulfill the other two use-cases.
          Hide
          Noble Paul added a comment -

          Henri - your indignation is understandable

          We are in a hurry to make the 1.3 release and this is the last man standing. I also tried to apply the patch but it didn't quite apply on the trunk. With the trunk code changing in an hourly basis it can be frustrating at times for contributors submitting patch.

          Probably you can submit a modified patch or you can just add whatever is missing in shalin's patch and we can move forward.

          Show
          Noble Paul added a comment - Henri - your indignation is understandable We are in a hurry to make the 1.3 release and this is the last man standing. I also tried to apply the patch but it didn't quite apply on the trunk. With the trunk code changing in an hourly basis it can be frustrating at times for contributors submitting patch. Probably you can submit a modified patch or you can just add whatever is missing in shalin's patch and we can move forward.
          Hide
          Henri Biestro added a comment - - edited

          New leaner version since there was indeed a simpler way to achieve the same (reviewing & friendly pressure does help to improve, thanks Shalin).
          Fewer classes modified so the bulk of the code deals with going through XML configuration.
          I also removed loading properties from 'solr.properties' which was useless as it was.
          Added a specific persistence verification & a multicore inspired test to verify shared schema/config configuration.
          I updated the issue description to reflect the patch feature & reasons for code modification in Coding Notes (intended to help review).
          patch applied cleanly with 'patch -u -p0 < solr-646.patch' on trunk 687771

          Show
          Henri Biestro added a comment - - edited New leaner version since there was indeed a simpler way to achieve the same (reviewing & friendly pressure does help to improve, thanks Shalin). Fewer classes modified so the bulk of the code deals with going through XML configuration. I also removed loading properties from 'solr.properties' which was useless as it was. Added a specific persistence verification & a multicore inspired test to verify shared schema/config configuration. I updated the issue description to reflect the patch feature & reasons for code modification in Coding Notes (intended to help review). patch applied cleanly with 'patch -u -p0 < solr-646.patch' on trunk 687771
          Hide
          Noble Paul added a comment -

          The include allows to import a file within the schema (or a solrconfig); this can help de-clutter long schemas.

          I am not very happy about this feature . I feel it can seriously affect readability . Let us postpone this and have separate discussion on whether we should have it.

          I some how think that

          <property name="the-name" value="the value"/>
          <!-- is more readable and like ant than the foloowing one-->
          <property name="the-name">the-value</property>
          
          Show
          Noble Paul added a comment - The include allows to import a file within the schema (or a solrconfig); this can help de-clutter long schemas. I am not very happy about this feature . I feel it can seriously affect readability . Let us postpone this and have separate discussion on whether we should have it. I some how think that <property name= "the-name" value= "the value" /> <!-- is more readable and like ant than the foloowing one--> <property name= "the-name" >the-value</property>
          Hide
          Henri Biestro added a comment - - edited

          The "ant-ish" syntax is certainly more common; I'll fix this.

          <property name="the-name" value="the value"/>
          

          About the include; I do have the use case and it does make my solrconfig & schema more readable. And HossMan did not rule it out as a "no no".
          I understand that there are mixed fillings about it though; what about putting it in with an "experimental" state? (big warning & boolean in solr.xml).
          So we can have the discussion after letting people experiment; if response is bad or there are technical reasons to remove it, we'll just remove the code and be done (much easier than adding it later).

          Show
          Henri Biestro added a comment - - edited The "ant-ish" syntax is certainly more common; I'll fix this. <property name= "the-name" value= "the value" /> About the include; I do have the use case and it does make my solrconfig & schema more readable. And HossMan did not rule it out as a "no no". I understand that there are mixed fillings about it though; what about putting it in with an "experimental" state? (big warning & boolean in solr.xml). So we can have the discussion after letting people experiment; if response is bad or there are technical reasons to remove it, we'll just remove the code and be done (much easier than adding it later).
          Hide
          Shalin Shekhar Mangar added a comment -

          I feel that including fragments deserves it's own jira issue and a discussion on solr-dev. I cannot imagine a use-case for including xml fragments which configurable properties cannot handle. We should remove it from this issue and let this one focus on configurable properties. Similarly, persisting a core through the admin handler is something which does not belong to this issue.

          1. The Evaluator keeps a lot of state which may not be necessary. Why not delegate the house-keeping of expressions to the concerned class (CoreContainer, CoreDescriptor)?
            // the map of multicore/core scope name to expressions map
              protected Map<String, Map<String, String>> exprMap;
              // the map of multicore/core scope name to expressions
              protected Map<String, Properties> propMap;
              // the current expression scope (where new expressions are defined)
              protected Map<String, String> expressions;
              // the current expressions scope (where new expressions are defined)
              protected Properties properties;
            
          2. Evaluator is once defined in DOMUtil and then extended in two places. In particular the one inside CoreContainer seems unnecessary.
          3. Why are we going in a round about way of filling in properties in CoreContainer?
            Properties loaderProperties = new Properties();
            this.loader = new SolrResourceLoader(dir, null, loaderProperties);
            FileInputStream cfgis = new FileInputStream(configFile);
            final Evaluator eval = new Evaluator();
                  Config cfg = new Config(loader, "SOLR", cfgis, null) {
                    @Override
                    public void substituteProperties(Properties properties) throws ParserConfigurationException, IOException, SAXException {
                      // expands the doc, inject new properties in loaderProperties
                      eval.substituteProperties(properties, doc);
                      doLog(log, properties);
                    }
                  };
                  cfg.substituteProperties(loaderProperties);
            

            I'm still trying to figure out the exact place where the loaderProperties is filled in with the values read from solr.xml

          4. Attaching a version to each core is something that we should not put inside the example solr.xml. I don't think it is a good use-case. If you move to a newer version, you'd still need to re-index, then why keep a versioned data directory? Except for possibly a data.dir property, there's little need to put anything there. It almost gives the idea that solr has become language-aware
          5. I agree to Hoss's recommendation of not removing public methods (though I did that myself in my patch)

          The Evaluator has a lot of code for the functionality it provides. I felt that modifying the old substituteSystemProperties method by passing in a property object was all that is needed. I feel that things can be more simplified here before it can be committed.

          Show
          Shalin Shekhar Mangar added a comment - I feel that including fragments deserves it's own jira issue and a discussion on solr-dev. I cannot imagine a use-case for including xml fragments which configurable properties cannot handle. We should remove it from this issue and let this one focus on configurable properties. Similarly, persisting a core through the admin handler is something which does not belong to this issue. The Evaluator keeps a lot of state which may not be necessary. Why not delegate the house-keeping of expressions to the concerned class (CoreContainer, CoreDescriptor)? // the map of multicore/core scope name to expressions map protected Map< String , Map< String , String >> exprMap; // the map of multicore/core scope name to expressions protected Map< String , Properties> propMap; // the current expression scope (where new expressions are defined) protected Map< String , String > expressions; // the current expressions scope (where new expressions are defined) protected Properties properties; Evaluator is once defined in DOMUtil and then extended in two places. In particular the one inside CoreContainer seems unnecessary. Why are we going in a round about way of filling in properties in CoreContainer? Properties loaderProperties = new Properties(); this .loader = new SolrResourceLoader(dir, null , loaderProperties); FileInputStream cfgis = new FileInputStream(configFile); final Evaluator eval = new Evaluator(); Config cfg = new Config(loader, "SOLR" , cfgis, null ) { @Override public void substituteProperties(Properties properties) throws ParserConfigurationException, IOException, SAXException { // expands the doc, inject new properties in loaderProperties eval.substituteProperties(properties, doc); doLog(log, properties); } }; cfg.substituteProperties(loaderProperties); I'm still trying to figure out the exact place where the loaderProperties is filled in with the values read from solr.xml Attaching a version to each core is something that we should not put inside the example solr.xml. I don't think it is a good use-case. If you move to a newer version, you'd still need to re-index, then why keep a versioned data directory? Except for possibly a data.dir property, there's little need to put anything there. It almost gives the idea that solr has become language-aware I agree to Hoss's recommendation of not removing public methods (though I did that myself in my patch) The Evaluator has a lot of code for the functionality it provides. I felt that modifying the old substituteSystemProperties method by passing in a property object was all that is needed. I feel that things can be more simplified here before it can be committed.
          Hide
          Henri Biestro added a comment - - edited

          0 - I do use today an include mechanism of some sort so my field types are not defined in the schema; this allows to automate the schema generation from another tool & not have to deal with copying content.

          1 - the CoreContainer evaluator is a "transient" being; it keeps track of properties as they are "read" and how they are evaluated during configuration. It is not stored nor kept beyond the scope of load().
          I dont see what you mean by house keeping. The things we keep afterwards are property expressions (1 map in CoreContainer & 1 map per CoreDescriptor) and 1 Properties instance per SolrResourceLoader.

          2 - The CoreContainer Evaluator is the part that declares properties, assign them a scope, etc. The code is fairly well commented so I dont really get your comment. Anyway, it is necessary, for persistence to be idempotent, ie we should be able to write what we (just) read as in SolrProperties test, we need to keep these around.
          The Config evaluator is dealing with the include so I wont get there; what about the idea of an "experimental" state?

          3 - To keep the evaluator so we can extract the pieces of information it collected (cf 1). The loaderProperties are filled in by the evaluator as the last step of its subsituteProperties method. We have to do it at the end so the original evaluation scope is preserved during the whole evaluation.

          4 - You misinterpret my example; when I'm changing some indexing/schema configuration in an heavy way, I like to be able to test it before, etc; these changes (like changing stemmer, synonyms, etc) require reindexing. So, the version I'm talking about is the 'index data' version. To be strict, I do consider modifyng the schema a "major" version and modifying config a "minor" one wrt to index reindex/reload. But that would make the example a bit terse.

          5 - I believe I did not remove any public method in the patch, did I?

          6 - The visitor pattern used in the original substituteSystemProperties has just been 'methodified'. The code is no different than what it was and it allows to implement just what's needed by derivation.

          Show
          Henri Biestro added a comment - - edited 0 - I do use today an include mechanism of some sort so my field types are not defined in the schema; this allows to automate the schema generation from another tool & not have to deal with copying content. 1 - the CoreContainer evaluator is a "transient" being; it keeps track of properties as they are "read" and how they are evaluated during configuration. It is not stored nor kept beyond the scope of load(). I dont see what you mean by house keeping. The things we keep afterwards are property expressions (1 map in CoreContainer & 1 map per CoreDescriptor) and 1 Properties instance per SolrResourceLoader. 2 - The CoreContainer Evaluator is the part that declares properties, assign them a scope, etc. The code is fairly well commented so I dont really get your comment. Anyway, it is necessary, for persistence to be idempotent, ie we should be able to write what we (just) read as in SolrProperties test, we need to keep these around. The Config evaluator is dealing with the include so I wont get there; what about the idea of an "experimental" state? 3 - To keep the evaluator so we can extract the pieces of information it collected (cf 1). The loaderProperties are filled in by the evaluator as the last step of its subsituteProperties method. We have to do it at the end so the original evaluation scope is preserved during the whole evaluation. 4 - You misinterpret my example; when I'm changing some indexing/schema configuration in an heavy way, I like to be able to test it before, etc; these changes (like changing stemmer, synonyms, etc) require reindexing. So, the version I'm talking about is the 'index data' version. To be strict, I do consider modifyng the schema a "major" version and modifying config a "minor" one wrt to index reindex/reload. But that would make the example a bit terse. 5 - I believe I did not remove any public method in the patch, did I? 6 - The visitor pattern used in the original substituteSystemProperties has just been 'methodified'. The code is no different than what it was and it allows to implement just what's needed by derivation.
          Hide
          Henri Biestro added a comment - - edited

          Modified property definition syntax to follow ant's.
          Renamed test class.
          Made the include an experimental feature.

          Show
          Henri Biestro added a comment - - edited Modified property definition syntax to follow ant's. Renamed test class. Made the include an experimental feature.
          Hide
          Noble Paul added a comment - - edited

          understand that there are mixed fillings about it though; what about putting it in with an "experimental" state? (big warning & boolean in solr.xml).

          The scope of this issue actually does not include that. That is why we can raise this as a separate issue. I understand that there are use cases which this may serve. I am more concerned about how it can promote practices which are undesirable.

          We are very close to a release . So it is wiser to stick to the bare necessities than adding an experimental feature. Other features which are included in this issue have a go and we should not hold up the release for the want of a larger discussion and consensus.

          we atleast have two '-1' on this .Ryan says 'drop it' and shalin is '-1'

          Show
          Noble Paul added a comment - - edited understand that there are mixed fillings about it though; what about putting it in with an "experimental" state? (big warning & boolean in solr.xml). The scope of this issue actually does not include that. That is why we can raise this as a separate issue. I understand that there are use cases which this may serve. I am more concerned about how it can promote practices which are undesirable. We are very close to a release . So it is wiser to stick to the bare necessities than adding an experimental feature. Other features which are included in this issue have a go and we should not hold up the release for the want of a larger discussion and consensus. we atleast have two '-1' on this .Ryan says 'drop it' and shalin is '-1'
          Hide
          Shalin Shekhar Mangar added a comment -

          0 - I do use today an include mechanism of some sort so my field types are not defined in the schema; this allows to automate the schema generation from another tool & not have to deal with copying content.

          I agree that it may be useful for you but it is a big change to have at this point of time. It does not belong to this issue anyway. The changes in CoreAdminHandler are useful but it does not fit the description of this issue. Please raise another issue for these functionalities.

          1 - the CoreContainer evaluator is a "transient" being; it keeps track of properties as they are "read" and how they are evaluated during configuration. It is not stored nor kept beyond the scope of load(). I dont see what you mean by house keeping. The things we keep afterwards are property expressions (1 map in CoreContainer & 1 map per CoreDescriptor) and 1 Properties instance per SolrResourceLoader.

          Keeping the global expression in CoreContainer and the core-specific expressions in CoreContainer is fine. The evaluator does not need to store any information. It just needs to evaluate an expression found in a DOM node using the properties passed in to the substitute method.

          5 - I believe I did not remove any public method in the patch, did I?

          Yes, that was an oversight on my part. Sorry about that.

          6 - The visitor pattern used in the original substituteSystemProperties has just been 'methodified'. The code is no different than what it was and it allows to implement just what's needed by derivation.

          I don't see why that is necessary at all. Truth be spoken, I feel we can do without Evaluator itself in it's current form.

          The problem statement is "We have a DOM tree where a node may have an expression which needs to be replaced by a property value". The existing DOMUtil is already doing all this but only with a value determined from System.getProperty. The only extra functionality that we need is to look up the expression from a Properties object and if none found, then look up in System.getProperty. The CoreContainer and CoreDescriptor need to keep a Properties object which has the values core-specific, implicit and global in that fall-back order. The parsing and writing code is to be added to CoreContainer.

          I don't see why anything more would be necessary for solving this problem. I'm not comfortable with the large amount of changes when we are so close to releasing 1.3

          Show
          Shalin Shekhar Mangar added a comment - 0 - I do use today an include mechanism of some sort so my field types are not defined in the schema; this allows to automate the schema generation from another tool & not have to deal with copying content. I agree that it may be useful for you but it is a big change to have at this point of time. It does not belong to this issue anyway. The changes in CoreAdminHandler are useful but it does not fit the description of this issue. Please raise another issue for these functionalities. 1 - the CoreContainer evaluator is a "transient" being; it keeps track of properties as they are "read" and how they are evaluated during configuration. It is not stored nor kept beyond the scope of load(). I dont see what you mean by house keeping. The things we keep afterwards are property expressions (1 map in CoreContainer & 1 map per CoreDescriptor) and 1 Properties instance per SolrResourceLoader. Keeping the global expression in CoreContainer and the core-specific expressions in CoreContainer is fine. The evaluator does not need to store any information. It just needs to evaluate an expression found in a DOM node using the properties passed in to the substitute method. 5 - I believe I did not remove any public method in the patch, did I? Yes, that was an oversight on my part. Sorry about that. 6 - The visitor pattern used in the original substituteSystemProperties has just been 'methodified'. The code is no different than what it was and it allows to implement just what's needed by derivation. I don't see why that is necessary at all. Truth be spoken, I feel we can do without Evaluator itself in it's current form. The problem statement is "We have a DOM tree where a node may have an expression which needs to be replaced by a property value". The existing DOMUtil is already doing all this but only with a value determined from System.getProperty. The only extra functionality that we need is to look up the expression from a Properties object and if none found, then look up in System.getProperty. The CoreContainer and CoreDescriptor need to keep a Properties object which has the values core-specific, implicit and global in that fall-back order. The parsing and writing code is to be added to CoreContainer. I don't see why anything more would be necessary for solving this problem. I'm not comfortable with the large amount of changes when we are so close to releasing 1.3
          Hide
          Henri Biestro added a comment -

          Given that:
          1 - I've wasted too much time of too many people with a feature & code that's too complex/wide
          2 - The time & added delay it will take for me to create the sub-issues (for the 2 bugs in CoreContainer, 1 bug & 1 rfe for persistence), modify this issue description & examples and create the code (although I'm not sure that's needed or a desirable)
          3 - There is an alternate patch code ready that a committer has written that he's comfortable with that solves the property expansion problem and he has been kind enough not to commit it already

          Let's use the efficient route:
          1 - Push this solr-646 issue to a 'future release' date so it can be revisited if ever needed or at least serve as an example of "things to not do when you contribute"
          2 - Create a new issue for 1.3 that is solved by the alternative patch, commit it, close the issue & release 1.3

          Show
          Henri Biestro added a comment - Given that: 1 - I've wasted too much time of too many people with a feature & code that's too complex/wide 2 - The time & added delay it will take for me to create the sub-issues (for the 2 bugs in CoreContainer, 1 bug & 1 rfe for persistence), modify this issue description & examples and create the code (although I'm not sure that's needed or a desirable) 3 - There is an alternate patch code ready that a committer has written that he's comfortable with that solves the property expansion problem and he has been kind enough not to commit it already Let's use the efficient route: 1 - Push this solr-646 issue to a 'future release' date so it can be revisited if ever needed or at least serve as an example of "things to not do when you contribute" 2 - Create a new issue for 1.3 that is solved by the alternative patch, commit it, close the issue & release 1.3
          Hide
          Otis Gospodnetic added a comment -

          Henri - this is very big of you! It's good to see this flexibility.

          Show
          Otis Gospodnetic added a comment - Henri - this is very big of you! It's good to see this flexibility.
          Hide
          Shalin Shekhar Mangar added a comment -

          Henri – This is indeed very big of you! I appreciate your support very much. Thank you!

          I shall create a new issue for this and attach a patch there.

          Show
          Shalin Shekhar Mangar added a comment - Henri – This is indeed very big of you! I appreciate your support very much. Thank you! I shall create a new issue for this and attach a patch there.
          Hide
          Shalin Shekhar Mangar added a comment -

          Per Henri's suggestion, marking this for 1.4 – the core issues will be taken care of in SOLR-716 in 1.3 itself

          Show
          Shalin Shekhar Mangar added a comment - Per Henri's suggestion, marking this for 1.4 – the core issues will be taken care of in SOLR-716 in 1.3 itself
          Hide
          Hoss Man added a comment -

          I've attempted to catch up on the comments on this issue, and somewhere i got pretty lost (i haven't even tried looking at the patches yet) but FWIW: there is at least the huge use cases i know of in favor of supporting include/import related to master/slave differences...

          1. only masters should trigger snapshot creation in postCommit/postOptimize hooks, only slaves should use QuerySenderListener to do static warming of things in firstSearcher/newSearcher
          2. different cache configs for master/slave – want smaller caches on maters, with little/no autowarm, and in some cases don't want certain user caches to exist at all
          3. don't register certain intensive request handlers on masters at all, so people don't inadvertantly hose the master by querying it directly.

          ...some of these things might be achieved by having a lot of properties, but ideally you want one property with a value of either "master" or "slave" and use that as part of hte name of fragment files to include.

          At CNET, we've been using m4 macros to generate master/slave configs from the same template file since forever, but it would certainly be nice to have Solr take care of this natively – and it would make promoting a slave to a master in failover situations trivial (just bounce the port with one JNDI variable changed)

          BUT Solr users have lived without this feature for along time, not need to rush it if hte parties involved think it's better to hold off until after 1.3

          Show
          Hoss Man added a comment - I've attempted to catch up on the comments on this issue, and somewhere i got pretty lost (i haven't even tried looking at the patches yet) but FWIW: there is at least the huge use cases i know of in favor of supporting include/import related to master/slave differences... only masters should trigger snapshot creation in postCommit/postOptimize hooks, only slaves should use QuerySenderListener to do static warming of things in firstSearcher/newSearcher different cache configs for master/slave – want smaller caches on maters, with little/no autowarm, and in some cases don't want certain user caches to exist at all don't register certain intensive request handlers on masters at all, so people don't inadvertantly hose the master by querying it directly. ...some of these things might be achieved by having a lot of properties, but ideally you want one property with a value of either "master" or "slave" and use that as part of hte name of fragment files to include. At CNET, we've been using m4 macros to generate master/slave configs from the same template file since forever, but it would certainly be nice to have Solr take care of this natively – and it would make promoting a slave to a master in failover situations trivial (just bounce the port with one JNDI variable changed) BUT Solr users have lived without this feature for along time, not need to rush it if hte parties involved think it's better to hold off until after 1.3
          Hide
          Henri Biestro added a comment -

          Updated to be based on SOLR-716 commit. Thanks Shalin.

          Some improvement over SOLR--716 :

          • in solr.xml, /Solr/cores/core attributes can use properties defined in /solr or system (including name/aliases)
          • persistence of property expressions (rather than values)
          • All core descriptor members can be expressions (related to SOLR-715 )
          • fixes a bug in persistence related to SOLR-718 commit (will have to create an issue for this)
          • If a core was declared with an expression as its name and its aliases haven't changed, persistence will rewrite the property expression.

          That is, if you solr.xml does something like:

          <core>
          ...
            <cores adminPath="/admin/cores">
              <core name="${en-cores}" instanceDir="./">
          ...
          </core>
          

          The expression '$

          {en-cores}

          ' will be written back (if you did not realias the core of course).

          Some improvements over previous version:
          There are now 2 ways to include files
          One is a tribute to SOLR-712 (Thanks Amit) which means to include through entities with the twist that an entity resolver exploit properties in URIs in the 'solr:' protocol:

          <!DOCTYPE schema [
          <!ENTITY textL10n SYSTEM "solr:${l10ntypes}">
          ]>
          <schema name="example core ${l10n}" version="1.1">
            <types>
             <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
             <!--include resource="text-l10n.xml"/-->
             &textL10n;
            </types>
            ...
          </schema>
          

          Or the easier to read way (imho):

          <schema name="example core ${l10n}" version="1.1">
            <types>
              <fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
          	<include name='text-l10n.xml'/>
            </types>
          ...
          </schema>
          

          There are tests for both versions.
          Now, we've got options.

          Show
          Henri Biestro added a comment - Updated to be based on SOLR-716 commit. Thanks Shalin. Some improvement over SOLR--716 : in solr.xml, /Solr/cores/core attributes can use properties defined in /solr or system (including name/aliases) persistence of property expressions (rather than values) All core descriptor members can be expressions (related to SOLR-715 ) fixes a bug in persistence related to SOLR-718 commit (will have to create an issue for this) If a core was declared with an expression as its name and its aliases haven't changed, persistence will rewrite the property expression. That is, if you solr.xml does something like: <core> ... <cores adminPath= "/admin/cores" > <core name= "${en-cores}" instanceDir= "./" > ... </core> The expression '$ {en-cores} ' will be written back (if you did not realias the core of course). Some improvements over previous version: There are now 2 ways to include files One is a tribute to SOLR-712 (Thanks Amit) which means to include through entities with the twist that an entity resolver exploit properties in URIs in the 'solr:' protocol: <!DOCTYPE schema [ <!ENTITY textL10n SYSTEM "solr:${l10ntypes}" > ]> <schema name= "example core ${l10n}" version= "1.1" > <types> <fieldtype name= "string" class= "solr.StrField" sortMissingLast= "true" omitNorms= "true" /> <!--include resource= "text-l10n.xml" /--> &textL10n; </types> ... </schema> Or the easier to read way (imho): <schema name= "example core ${l10n}" version= "1.1" > <types> <fieldtype name= "string" class= "solr.StrField" sortMissingLast= "true" omitNorms= "true" /> <include name='text-l10n.xml'/> </types> ... </schema> There are tests for both versions. Now, we've got options.
          Hide
          Shalin Shekhar Mangar added a comment -

          I may not be able to look at this for a week or two. Removing myself as the assignee so that I don't hold back others.

          Show
          Shalin Shekhar Mangar added a comment - I may not be able to look at this for a week or two. Removing myself as the assignee so that I don't hold back others.
          Hide
          Shalin Shekhar Mangar added a comment -

          Marked for 1.5

          Show
          Shalin Shekhar Mangar added a comment - Marked for 1.5
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.

            People

            • Assignee:
              Unassigned
              Reporter:
              Henri Biestro
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development