Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9939

Proper ASCII folding of Danish/Norwegian characters Ø, Å

    XMLWordPrintableJSON

    Details

    • Lucene Fields:
      Patch Available
    • Review Patch?:
      Yes

      Description

      The current version of the ASCIIFoldingFilter sets Å, å to A, a and Ø, ø to O, o which I believe is incorrect.

      Å was added by Norway as a replacement for the Aa (which is mapped to aa in the AsciiFoldingFilter) in 1917 and by Denmark in 1948. Aa is still used in a lot of names (as an example the second largest city in Denmark was originally named Aarhus, renamed to Århus in 1948 and named back to AArhus in 2010 for internationalization purposes).

      The story of Ø is similar. It's equivalent to Œ (which is mapped to oe), not ö (which is mapped to o) and is generally mapped to oe in ascii text.

      The third Danish character Æ is already properly mapped to AE.

        Attachments

        1. LUCENE-9939.patch
          11 kB
          Jacob Lauritzen

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jacse Jacob Lauritzen
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: