Solr
  1. Solr
  2. SOLR-1078

WordDelimiterFilter do wrong word breaking for Thai vowel

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: Schema and Analysis
    • Labels:
      None
    • Environment:

      Ubuntu 8.10 64bit
      Java 1.6.0_10

      Description

      With any configuration of schema.xml

      <filter class="solr.WordDelimiterFilterFactory" />

      will do wrong word breaking with Thai characters.


      Example: "ผู้ ใหญ่ บ้าน"

      Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน"

      Expect result: 0 => "ผู้", 1 => "ใหญ่", 2 => "บ้าน"


      Example2: "ผู้ใหญ่บ้าน" (no space)

      Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน" (same result)

      Expect result: 0 => "ผู้ใหญ่บ้าน"


      There's a similar problem with Drupal (http://drupal.org/node/335928)

      1. SOLR-1078.patch
        4 kB
        Yonik Seeley

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            SIriwat Aumngamsup
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development