Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3747

Support Unicode 6.1.0

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 3.5, 4.0-ALPHA
    • 4.0-BETA, 6.0
    • modules/analysis
    • None
    • New, Patch Available

    Description

      Now that Unicode 6.1.0 has been released, Lucene/Solr should support it.

      JFlex trunk now supports Unicode 6.1.0.

      Tasks include:

      • Upgrade ICU4J to v49 (after it's released, on 2012-03-21, according to http://icu-project.org).
      • Use icu module tools to regenerate the supplementary character additions to JFlex grammars.
      • Version the JFlex grammars: copy the current implementations to *Impl3<X>; cause the versioning tokenizer wrappers to instantiate this version when the Version c-tor param is in the range 3.1 to the version in which these changes are released (excluding the range endpoints); then change the specified Unicode version in the non-versioned JFlex grammars from 6.0 to 6.1.
      • Regenerate JFlex scanners, including StandardTokenizerImpl, UAX29URLEmailTokenizerImpl, and HTMLStripCharFilter.
      • Using generateJavaUnicodeWordBreakTest.pl, generate and then run WordBreakTestUnicode_6_1_0.java under modules/analysis/common/src/test/org/apache/lucene/analysis/core/

      Attachments

        1. LUCENE-3747-remainders.patch
          22 kB
          Steven Rowe
        2. LUCENE-3747.patch
          678 kB
          Steven Rowe
        3. LUCENE-3747.patch
          1.02 MB
          Steven Rowe

        Activity

          People

            sarowe Steven Rowe
            sarowe Steven Rowe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: