Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      See LUCENE-6993.

      We want to bring all these tokenizers up to date. The icu part can be done independently.

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        Here's a patch (does not include regenerated binary changes).

        It bumps the version, removes khmer syllable segmentation in favor of ICU's khmer support (and adds test), and regenerates all data files.

        Show
        rcmuir Robert Muir added a comment - Here's a patch (does not include regenerated binary changes). It bumps the version, removes khmer syllable segmentation in favor of ICU's khmer support (and adds test), and regenerates all data files.
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b0a43aa1b2819133ec2ee69545a62358baf440b3 in lucene-solr's branch refs/heads/master from Robert Muir
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0a43aa ]

        LUCENE-7035: Upgrade icu4j to 56.1/unicode 8.

        Show
        jira-bot ASF subversion and git services added a comment - Commit b0a43aa1b2819133ec2ee69545a62358baf440b3 in lucene-solr's branch refs/heads/master from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0a43aa ] LUCENE-7035 : Upgrade icu4j to 56.1/unicode 8.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fc879d1a5d97fae8e805fb3d194557851539873d in lucene-solr's branch refs/heads/master from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fc879d1 ]

        LUCENE-7035: Also regenerate analysis/common's UnicodeWhitespaceTokenizer (it actually changes nothing, but updates version numbers)

        Show
        jira-bot ASF subversion and git services added a comment - Commit fc879d1a5d97fae8e805fb3d194557851539873d in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fc879d1 ] LUCENE-7035 : Also regenerate analysis/common's UnicodeWhitespaceTokenizer (it actually changes nothing, but updates version numbers)
        Hide
        thetaphi Uwe Schindler added a comment -

        Hi Robert, I also regenerated inside analysis/common, because this one creates the UnicodeWhitespaceTokenizer's data file from icu4j.jar. This actually did not change anything, but the file versions were updated.

        Maybe we should add a message in analysis/icu's build.xml that reminds you to also update the analysis/common files if you update ICU.

        Show
        thetaphi Uwe Schindler added a comment - Hi Robert, I also regenerated inside analysis/common, because this one creates the UnicodeWhitespaceTokenizer's data file from icu4j.jar. This actually did not change anything, but the file versions were updated. Maybe we should add a message in analysis/icu's build.xml that reminds you to also update the analysis/common files if you update ICU.
        Hide
        rcmuir Robert Muir added a comment -

        To me that "cache" is independent of icu.

        Show
        rcmuir Robert Muir added a comment - To me that "cache" is independent of icu.

          People

          • Assignee:
            Unassigned
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development