Solr
  1. Solr
  2. SOLR-2436

move uimaConfig to under the uima's update processor in solrconfig.xml

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Solr contrib UIMA has its config just beneath <config>. I think it should move to uima's update processor tag.

      1. SOLR-2436.patch
        7 kB
        Koji Sekiguchi
      2. SOLR-2436.patch
        14 kB
        Koji Sekiguchi
      3. SOLR-2436_2.patch
        13 kB
        Tommaso Teofili
      4. SOLR-2436.patch
        13 kB
        Koji Sekiguchi
      5. SOLR-2436-3.patch
        12 kB
        Tommaso Teofili
      6. SOLR-2436.patch
        22 kB
        Koji Sekiguchi
      7. SOLR-2436-multi-map.patch
        3 kB
        Koji Sekiguchi
      8. SOLR-2436-multi-map.patch
        7 kB
        Koji Sekiguchi

        Activity

        Hide
        Robert Muir added a comment -

        Bulk close for 3.2

        Show
        Robert Muir added a comment - Bulk close for 3.2
        Hide
        Koji Sekiguchi added a comment -

        trunk: Committed revision 1100435.
        3x: Committed revision 1100436.

        Show
        Koji Sekiguchi added a comment - trunk: Committed revision 1100435. 3x: Committed revision 1100436.
        Hide
        Koji Sekiguchi added a comment -

        updated patch that includes a test case for multiple mapping.
        I'll commit shortly.

        Show
        Koji Sekiguchi added a comment - updated patch that includes a test case for multiple mapping. I'll commit shortly.
        Hide
        Koji Sekiguchi added a comment -

        While I was reviewing my requirement:

        http://www.lucidimagination.com/search/document/983a059d6d370ce/uima_fieldmappings_and_solr_dynamicfield

        I found a bug that's been introduced in this ticket. I'd mentioned it at above in this thread, but not corrected at the time.

        Because a type can have multiple features, the mapping definition:

        <lst name="fieldMappings">
          <lst name="mapping">
            <str name="type">org.apache.uima.SentenceAnnotation</str>
            <str name="feature">coveredText</str>
            <str name="field">sentence</str>
          </lst>
        </lst>
        

        should be:

        <lst name="fieldMappings">
          <lst name="type">
            <str name="name">org.apache.uima.SentenceAnnotation</str>
            <lst name="mapping">
              <str name="feature">coveredText</str>
              <str name="field">sentence</str>
            </lst>
          </lst>
        </lst>
        
        Show
        Koji Sekiguchi added a comment - While I was reviewing my requirement: http://www.lucidimagination.com/search/document/983a059d6d370ce/uima_fieldmappings_and_solr_dynamicfield I found a bug that's been introduced in this ticket. I'd mentioned it at above in this thread, but not corrected at the time. Because a type can have multiple features, the mapping definition: <lst name= "fieldMappings" > <lst name= "mapping" > <str name= "type" >org.apache.uima.SentenceAnnotation</str> <str name= "feature" >coveredText</str> <str name= "field" >sentence</str> </lst> </lst> should be: <lst name= "fieldMappings" > <lst name= "type" > <str name= "name" >org.apache.uima.SentenceAnnotation</str> <lst name= "mapping" > <str name= "feature" >coveredText</str> <str name= "field" >sentence</str> </lst> </lst> </lst>
        Hide
        Tommaso Teofili added a comment - - edited

        Thanks Koji

        Show
        Tommaso Teofili added a comment - - edited Thanks Koji
        Hide
        Koji Sekiguchi added a comment -

        trunk: Committed revision 1096315.
        3x: Committed revision 1096316.

        Thanks, Tommaso!

        Show
        Koji Sekiguchi added a comment - trunk: Committed revision 1096315. 3x: Committed revision 1096316. Thanks, Tommaso!
        Hide
        Koji Sekiguchi added a comment -

        A new patch that includes updated CHANGES.txt and README.txt. I'll commit tonight.

        Show
        Koji Sekiguchi added a comment - A new patch that includes updated CHANGES.txt and README.txt. I'll commit tonight.
        Hide
        Koji Sekiguchi added a comment -

        And now I realized there are files (aggregate-uima-config.xml and uima-fields.xml) under src/main/resources/solr/conf. Those seems to be sample config for alchemy api. Patch should include aggregate-uima-config.xml modified if we keep them.

        Show
        Koji Sekiguchi added a comment - And now I realized there are files (aggregate-uima-config.xml and uima-fields.xml) under src/main/resources/solr/conf. Those seems to be sample config for alchemy api. Patch should include aggregate-uima-config.xml modified if we keep them.
        Hide
        Uwe Schindler added a comment -

        When I did it, I missed something. Thank you for the alarm.

        The code in DataImporter.java and DataImportHandler.java is a very bad example because it also supports loading from StringReaders and such stuff solely for testing puposes. Because of this the code is very complicated and parts of it are in both files. The examples in Solr core are much better.

        Show
        Uwe Schindler added a comment - When I did it, I missed something. Thank you for the alarm. The code in DataImporter.java and DataImportHandler.java is a very bad example because it also supports loading from StringReaders and such stuff solely for testing puposes. Because of this the code is very complicated and parts of it are in both files. The examples in Solr core are much better.
        Hide
        Tommaso Teofili added a comment -

        Thanks Uwe for the useful clarification regarding XML resources loading. I agree having such information on the wiki would be good.

        If it is going to commit, it breaks back-compat. I think we need a note for users in CHANGES.txt.

        Yes, and we need to align the README and wiki too.

        Show
        Tommaso Teofili added a comment - Thanks Uwe for the useful clarification regarding XML resources loading. I agree having such information on the wiki would be good. If it is going to commit, it breaks back-compat. I think we need a note for users in CHANGES.txt. Yes, and we need to align the README and wiki too.
        Hide
        Koji Sekiguchi added a comment -

        The patch looks good, Tommaso!

        If it is going to commit, it breaks back-compat. I think we need a note for users in CHANGES.txt.

        Show
        Koji Sekiguchi added a comment - The patch looks good, Tommaso! If it is going to commit, it breaks back-compat. I think we need a note for users in CHANGES.txt.
        Hide
        Koji Sekiguchi added a comment -

        Hi Uwe,

        The problematic snippet regarding XInclude handling has been first introduced in my patch that I borrowed from DIH. When I did it, I missed something. Thank you for the alarm.

        Now we are trying to embed the config in update processor instead of loading it from out of solrconfig.xml, the problematic snippet are gone.

        Show
        Koji Sekiguchi added a comment - Hi Uwe, The problematic snippet regarding XInclude handling has been first introduced in my patch that I borrowed from DIH. When I did it, I missed something. Thank you for the alarm. Now we are trying to embed the config in update processor instead of loading it from out of solrconfig.xml, the problematic snippet are gone.
        Hide
        Uwe Schindler added a comment -

        Or perhaps we need a utility method and pointer to that?

        That's a little bit out of scope. I was thinking about that, too, but there are too many different ways to parse XML (Stax, DocumentBuilder), so I did not do this. The new Resolver and Logging classes already handle those interface implementations for all types of parsing, just initializing the parsers should be in the caller's responsibility. Also XSL is different, so all of them use ResourceLoader, but their code is different.

        It is just important that you just not use the standard copy-paste from XML howtos. A similar code example like above can be shown for Stax and Transformers/Templates. I see no need to make helper classes.

        I will add to wiki later.

        Show
        Uwe Schindler added a comment - Or perhaps we need a utility method and pointer to that? That's a little bit out of scope. I was thinking about that, too, but there are too many different ways to parse XML (Stax, DocumentBuilder), so I did not do this. The new Resolver and Logging classes already handle those interface implementations for all types of parsing, just initializing the parsers should be in the caller's responsibility. Also XSL is different, so all of them use ResourceLoader, but their code is different. It is just important that you just not use the standard copy-paste from XML howtos. A similar code example like above can be shown for Stax and Transformers/Templates. I see no need to make helper classes. I will add to wiki later.
        Hide
        Mark Miller added a comment -

        Or perhaps we need a utility method and pointer to that?

        Show
        Mark Miller added a comment - Or perhaps we need a utility method and pointer to that?
        Hide
        Mark Miller added a comment -

        Maybe we should add my last comment into the Wiki:

        +1

        Show
        Mark Miller added a comment - Maybe we should add my last comment into the Wiki: +1
        Hide
        Uwe Schindler added a comment -

        Maybe we should add my last comment into the Wiki: "Howto load XML from Solr's config resources", to prevent broken code again from appearing (if this no issue here anymore this is fine, I was just alarmed). I had a hard time to fix all XML handling in Solr (DIH is still broken with charsets), but XInclude now works as expected everywhere.

        Show
        Uwe Schindler added a comment - Maybe we should add my last comment into the Wiki: "Howto load XML from Solr's config resources", to prevent broken code again from appearing (if this no issue here anymore this is fine, I was just alarmed). I had a hard time to fix all XML handling in Solr (DIH is still broken with charsets), but XInclude now works as expected everywhere.
        Hide
        Uwe Schindler added a comment - - edited

        Here is the new way to load XML from ResourceLoaders in Solr (taken from Config). This code also intercepts errors and warnings and logs them correctly (parsers tend to write them to System.err):

        public static final Logger log = LoggerFactory.getLogger(Config.class);
        private static final XMLErrorLogger xmllog = new XMLErrorLogger(log);
        
        ...
        
        final InputSource is = new InputSource(loader.openConfig(name));
        is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name));
        
        final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        // only enable xinclude, if a SystemId is available
        if (is.getSystemId() != null) {
        	try {
        		dbf.setXIncludeAware(true);
        		dbf.setNamespaceAware(true);
        	} catch(UnsupportedOperationException e) {
        		log.warn(name + " XML parser doesn't support XInclude option");
        	}
        }
        
        final DocumentBuilder db = dbf.newDocumentBuilder();
        db.setEntityResolver(new SystemIdResolver(loader));
        db.setErrorHandler(xmllog);
        try {
        	doc = db.parse(is);
        } finally {
        	// some XML parsers are broken and don't close the byte stream (but they should according to spec)
        	IOUtils.closeQuietly(is.getByteStream());
        }
        
        Show
        Uwe Schindler added a comment - - edited Here is the new way to load XML from ResourceLoaders in Solr (taken from Config). This code also intercepts errors and warnings and logs them correctly (parsers tend to write them to System.err): public static final Logger log = LoggerFactory.getLogger(Config.class); private static final XMLErrorLogger xmllog = new XMLErrorLogger(log); ... final InputSource is = new InputSource(loader.openConfig(name)); is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name)); final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); // only enable xinclude, if a SystemId is available if (is.getSystemId() != null ) { try { dbf.setXIncludeAware( true ); dbf.setNamespaceAware( true ); } catch (UnsupportedOperationException e) { log.warn(name + " XML parser doesn't support XInclude option" ); } } final DocumentBuilder db = dbf.newDocumentBuilder(); db.setEntityResolver( new SystemIdResolver(loader)); db.setErrorHandler(xmllog); try { doc = db.parse(is); } finally { // some XML parsers are broken and don't close the byte stream (but they should according to spec) IOUtils.closeQuietly(is.getByteStream()); }
        Hide
        Uwe Schindler added a comment -

        I just looked at the patch, is the SOLR-2436_2.patch still active or replaced by Kojis?

        I ask because:

        +    try{
        +      final InputSource is = new InputSource(loader.openConfig(uimaConfigFile));
        +      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        +      // only enable xinclude, if SystemId is present (makes no sense otherwise)
        +      if (is.getSystemId() != null) {
        +        try {
        +          dbf.setXIncludeAware(true);
        +          dbf.setNamespaceAware(true);
        +        } catch( UnsupportedOperationException e ) {
        +          LOG.warn( "XML parser doesn't support XInclude option" );
        +        }
        +      }
        

        This XInclude Handling is broken (the if-clause never gets executed). We have a new framework that makes XML-Loading from ResourceLoaders working correct, even with relative pathes! Just look at the example committed during the cleanup issue (look at other places in solr where DocumentBuilders or XMLStreamReaders are instantiated. The new Solr way to load such files is a special URI scheme that is internally used to resolve ResourceLoader resources correctly (see SOLR-1656).

        The latest patch looks fine, it embeds the config directly, which seems much more consistent.

        Show
        Uwe Schindler added a comment - I just looked at the patch, is the SOLR-2436 _2.patch still active or replaced by Kojis? I ask because: + try{ + final InputSource is = new InputSource(loader.openConfig(uimaConfigFile)); + DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + // only enable xinclude, if SystemId is present (makes no sense otherwise) + if (is.getSystemId() != null) { + try { + dbf.setXIncludeAware(true); + dbf.setNamespaceAware(true); + } catch( UnsupportedOperationException e ) { + LOG.warn( "XML parser doesn't support XInclude option" ); + } + } This XInclude Handling is broken (the if-clause never gets executed). We have a new framework that makes XML-Loading from ResourceLoaders working correct, even with relative pathes! Just look at the example committed during the cleanup issue (look at other places in solr where DocumentBuilders or XMLStreamReaders are instantiated. The new Solr way to load such files is a special URI scheme that is internally used to resolve ResourceLoader resources correctly (see SOLR-1656 ). The latest patch looks fine, it embeds the config directly, which seems much more consistent.
        Hide
        Tommaso Teofili added a comment -

        Hello Koji,
        your patch seems fine to me from the functional point of view.

        Just, I don't think the SolrUIMAConfigurationReader should be emptied, I wouldn't remove it preferring to assign to it the simple responsibility of reading args without the previous explicit Node traversing but, as you did, using the "Solr way".
        I also made some fixes to remove warning while getting objects from the NamedList.

        Show
        Tommaso Teofili added a comment - Hello Koji, your patch seems fine to me from the functional point of view. Just, I don't think the SolrUIMAConfigurationReader should be emptied, I wouldn't remove it preferring to assign to it the simple responsibility of reading args without the previous explicit Node traversing but, as you did, using the "Solr way". I also made some fixes to remove warning while getting objects from the NamedList.
        Hide
        Koji Sekiguchi added a comment -

        Hi Tommaso,

        However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well.

        I've done this in the attached patch. Please review it. I'm not confident about the part:

        <lst name="fieldMappings">
          <lst name="mapping">
            <str name="type">org.apache.uima.SentenceAnnotation</str>
            <str name="feature">coveredText</str>
            <str name="field">sentence</str>
          </lst>
        </lst>
        

        the structure is appropriate or not.

        Show
        Koji Sekiguchi added a comment - Hi Tommaso, However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well. I've done this in the attached patch. Please review it. I'm not confident about the part: <lst name= "fieldMappings" > <lst name= "mapping" > <str name= "type" >org.apache.uima.SentenceAnnotation</str> <str name= "feature" >coveredText</str> <str name= "field" >sentence</str> </lst> </lst> the structure is appropriate or not.
        Hide
        Tommaso Teofili added a comment - - edited

        Hello Koji,
        I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387) to make tests work (see attached patch).

        In my opinion the solution you're proposing is better than the current one as it reflects the Solr way of specifying parameters in Handlers.

        However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well.

        Show
        Tommaso Teofili added a comment - - edited Hello Koji, I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387 ) to make tests work (see attached patch). In my opinion the solution you're proposing is better than the current one as it reflects the Solr way of specifying parameters in Handlers. However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well.
        Hide
        Koji Sekiguchi added a comment -

        A new patch attached. This time, uimaConfig is read from out of solrconfig.xml:

        <updateRequestProcessorChain name="uima">
          <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
            <str name="uimaConfig">uima-config.xml</str>
          </processor>
          <processor class="solr.LogUpdateProcessorFactory" />
          <processor class="solr.RunUpdateProcessorFactory" />
        </updateRequestProcessorChain>
        
        Show
        Koji Sekiguchi added a comment - A new patch attached. This time, uimaConfig is read from out of solrconfig.xml: <updateRequestProcessorChain name= "uima" > <processor class= "org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory" > <str name= "uimaConfig" >uima-config.xml</str> </processor> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
        Hide
        Tommaso Teofili added a comment -

        Thanks Koji, this was on my TODO list just after 3.1 (together with tests' enhance).
        I'll have a look at your patch and share my comments here.

        Show
        Tommaso Teofili added a comment - Thanks Koji, this was on my TODO list just after 3.1 (together with tests' enhance). I'll have a look at your patch and share my comments here.
        Hide
        Koji Sekiguchi added a comment -

        And if I want to use a form of "/config/updateRequestProcessorChain[@name='uima']/processor/uimaConfig/...", UpdateRequestProcessorFactory need to get the name of the processor chain. This needs API change.

        Show
        Koji Sekiguchi added a comment - And if I want to use a form of "/config/updateRequestProcessorChain [@name='uima'] /processor/uimaConfig/...", UpdateRequestProcessorFactory need to get the name of the processor chain. This needs API change.
        Hide
        Koji Sekiguchi added a comment -

        A draft patch attached. It reuses SolrUIMAConfigurationReader w/ simple modification. Due to it uses XPath (via SolrConfig.getNode()) like "/config/updateRequestProcessorChain/processor/uimaConfig/...", if there are multiple uima update processors in solrconfig.xml, they share one uimaConfig... This is bad.

        Show
        Koji Sekiguchi added a comment - A draft patch attached. It reuses SolrUIMAConfigurationReader w/ simple modification. Due to it uses XPath (via SolrConfig.getNode()) like "/config/updateRequestProcessorChain/processor/uimaConfig/...", if there are multiple uima update processors in solrconfig.xml, they share one uimaConfig... This is bad.

          People

          • Assignee:
            Koji Sekiguchi
            Reporter:
            Koji Sekiguchi
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development