Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      for Java 5. Java 5 is based on unicode 4, which means variable-width encoding.

      supplementary character support should be fixed for code that works with char/char[]

      For example:
      StandardAnalyzer, SimpleAnalyzer, StopAnalyzer, etc should at least be changed so they don't actually remove suppl characters, or modified to look for surrogates and behave correctly.
      LowercaseFilter should be modified to lowercase suppl. characters correctly.
      CharTokenizer should either be deprecated or changed so that isTokenChar() and normalize() use int.

      in all of these cases code should remain optimized for the BMP case, and suppl characters should be the exception, but still work.

      1. testCurrentBehavior.txt
        8 kB
        Robert Muir
      2. LUCENE-1689.patch
        7 kB
        Robert Muir
      3. LUCENE-1689.patch
        19 kB
        Robert Muir
      4. LUCENE-1689.patch
        52 kB
        Robert Muir
      5. LUCENE-1689_lowercase_example.txt
        1.0 kB
        Robert Muir

        Issue Links

          Activity

          Robert Muir created issue -
          Robert Muir made changes -
          Field Original Value New Value
          Attachment LUCENE-1689_lowercase_example.txt [ 12410505 ]
          Michael McCandless made changes -
          Fix Version/s 2.9 [ 12312682 ]
          Yonik Seeley made changes -
          Fix Version/s 3.1 [ 12314025 ]
          Fix Version/s 2.9 [ 12312682 ]
          Robert Muir made changes -
          Attachment testCurrentBehavior.txt [ 12410615 ]
          Robert Muir made changes -
          Attachment LUCENE-1689.patch [ 12415190 ]
          Robert Muir made changes -
          Attachment LUCENE-1689.patch [ 12415978 ]
          Robert Muir made changes -
          Attachment LUCENE-1689.patch [ 12415993 ]
          Robert Muir made changes -
          Link This issue incorporates LUCENE-2068 [ LUCENE-2068 ]
          Robert Muir made changes -
          Link This issue incorporates LUCENE-2069 [ LUCENE-2069 ]
          Robert Muir made changes -
          Link This issue incorporates LUCENE-2070 [ LUCENE-2070 ]
          Simon Willnauer made changes -
          Link This issue is related to LUCENE-2094 [ LUCENE-2094 ]
          Simon Willnauer made changes -
          Link This issue is blocked by LUCENE-2183 [ LUCENE-2183 ]
          Simon Willnauer made changes -
          Link This issue incorporates LUCENE-2183 [ LUCENE-2183 ]
          Simon Willnauer made changes -
          Link This issue is blocked by LUCENE-2183 [ LUCENE-2183 ]
          Robert Muir made changes -
          Component/s contrib/analyzers [ 12312333 ]
          Mark Thomas made changes -
          Workflow jira [ 12465769 ] Default workflow, editable Closed status [ 12563562 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12563562 ] jira [ 12585151 ]
          Shai Erera made changes -
          Component/s modules/analysis [ 12310230 ]
          Component/s contrib/analyzers [ 12312333 ]
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321140 ]
          Fix Version/s 4.0 [ 12314025 ]
          Steve Rowe made changes -
          Link This issue incorporates LUCENE-2847 [ LUCENE-2847 ]
          Mark Miller made changes -
          Fix Version/s 5.0 [ 12321663 ]
          Mark Miller made changes -
          Fix Version/s 4.2 [ 12323899 ]
          Fix Version/s 4.1 [ 12321140 ]
          Steve Rowe made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 5.0 [ 12321663 ]
          Fix Version/s 4.2 [ 12323899 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development