Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5422

Support mask for dynamic fields in the language detection processor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      User Story

      I need to stem multilingual document for indexing.
      I have several fields to stem and I use update request processor with langid.map.individual.fl, because I need to define language individually for every field. I have troubles with multivalued field. There is a field tag. First, I made this field multivalued, because my documents can have several tags.
      But processor didn't define language separately for tag values in follow case

      "document" : {
      ...
          "tag" : ["spanish", "español"]
      ...
      }
      

      So, I changed my schema and made field tag dynamic.

      "document" : {
      ...
          "tag_1" : "spanish",
          "tag_2" : "español"
      ...
      }
      

      But language detection processor ignores field like tag_*.
      Count of tags isn't limited for the document, so I can't define langid.map.individual.fl like tag_1, tag_2, ..., tag_37, because there can be tag_38 field in the document.

      So, I think it will be useful improvement if language detection processor supports definitions like

      <langid.fl>blah*, *blahblah</langid.fl>
      <langid.map.fl>blah*, *blahblah</langid.map.fl> 
      <langid.map.individual.fl>blah*, *blahblah</langid.map.individual.fl>
      

      Or if there will be possibility to tell solr : "I want you define language of my multivalued field separately for every value"

      Attachments

        Activity

          People

            Unassigned Unassigned
            vatuska Irina Gorbunova
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: