Solr
  1. Solr
  2. SOLR-2939

Clustering of multilingual search results

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: contrib - Clustering
    • Labels:
      None

      Description

      Carrot2 internally supports clustering of multilingual search results. The clustering component should allow passing a language field to Carrot2. This feature would need at least two new parameters: carrot.lang for the name of Solr field that contains the language code (ISO 639) and a carrot.lcmap field similar to the one in language recognizer to map arbitrary strings to ISO 639 codes.

      Another feature of language recognizer we should mirror is the expansion of the {{

      {lang}

      }} token in field names into the language code of the document (in case of multiple languages per document – the first Carrot2-supported language code). The feature seems easy to implement in the non-distributed setting of Solr, but the simple implementation isn't going to work in the distributed setting because the name of the specific field to be fetched depends on the content (language) of each matching document. Looking at the SearchClusteringEngine.getFieldsToLoad(SolrQueryRequest) method, a quick but costly solution would be to load the contents of all stored fields. I'm not too strong in distributed-mode Solr, but maybe this could be optimized so that only the required fields get fetched?

        Activity

        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Stanislaw Osinski made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 4.0 [ 12314992 ]
        Resolution Fixed [ 1 ]
        Hide
        Stanislaw Osinski added a comment -

        In trunk and branch_3x. Wiki page updated. The language code variable expansion in field names has not yet been implemented, I'll move it to a dedicated issue.

        Show
        Stanislaw Osinski added a comment - In trunk and branch_3x. Wiki page updated. The language code variable expansion in field names has not yet been implemented, I'll move it to a dedicated issue.
        Stanislaw Osinski created issue -

          People

          • Assignee:
            Stanislaw Osinski
            Reporter:
            Stanislaw Osinski
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development