Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
User Story
I need to stem multilingual document for indexing.
I have several fields to stem and I use update request processor with langid.map.individual.fl, because I need to define language individually for every field. I have troubles with multivalued field. There is a field tag. First, I made this field multivalued, because my documents can have several tags.
But processor didn't define language separately for tag values in follow case
"document" : { ... "tag" : ["spanish", "español"] ... }
So, I changed my schema and made field tag dynamic.
"document" : { ... "tag_1" : "spanish", "tag_2" : "español" ... }
But language detection processor ignores field like tag_*.
Count of tags isn't limited for the document, so I can't define langid.map.individual.fl like tag_1, tag_2, ..., tag_37, because there can be tag_38 field in the document.
So, I think it will be useful improvement if language detection processor supports definitions like
<langid.fl>blah*, *blahblah</langid.fl> <langid.map.fl>blah*, *blahblah</langid.map.fl> <langid.map.individual.fl>blah*, *blahblah</langid.map.individual.fl>
Or if there will be possibility to tell solr : "I want you define language of my multivalued field separately for every value"