Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1, 4.0-ALPHA
    • modules/analysis
    • None
    • New, Patch Available

    Description

      Command to run gennorm2 does not work at present. Also, icupkg needs to be called to convert the binary file to big-endian.

      I will attach a patch.

      Attachments

        1. gennorm2.patch
          2 kB
          David Bowen
        2. gennorm2.patch
          2 kB
          David Bowen
        3. LUCENE-2629.patch
          3 kB
          Robert Muir

        Activity

          dbowen David Bowen added a comment -

          Just a build.xml tweak.

          I included a couple of extra tests for the ICUFoldingFilter, on the basis that more tests can't hurt.

          dbowen David Bowen added a comment - Just a build.xml tweak. I included a couple of extra tests for the ICUFoldingFilter, on the basis that more tests can't hurt.
          rcmuir Robert Muir added a comment -

          perfect, now the file can be easily regenerated... i just tested.

          (i noticed for whatever strange reason the <delete> didnt delete the utr30.tmp, but i'll figure it out)

          Thanks a lot!

          rcmuir Robert Muir added a comment - perfect, now the file can be easily regenerated... i just tested. (i noticed for whatever strange reason the <delete> didnt delete the utr30.tmp, but i'll figure it out) Thanks a lot!
          dbowen David Bowen added a comment -

          Oops, I just noticed also that the tmpfile was not getting deleted. A stupid typo (${gennorm.tmp} instead of ${gennorm2.tmp}). Here's a fixed patch.

          dbowen David Bowen added a comment - Oops, I just noticed also that the tmpfile was not getting deleted. A stupid typo (${gennorm.tmp} instead of ${gennorm2.tmp}). Here's a fixed patch.
          dbowen David Bowen added a comment -

          And by the way, I tested that it is OK to run icupkg on the file even if it is already big-endian.

          I find it a strange concept to have two binary file formats, one for big-endian and one for little-endian, only one of which is usable. I would have thought that the gennorm2 program should generate the file format that works, no matter what machine it is run on.

          No doubt there are complex reasons for this design weirdness. I know that sadly, some people have to still deal with EBCDIC.

          dbowen David Bowen added a comment - And by the way, I tested that it is OK to run icupkg on the file even if it is already big-endian. I find it a strange concept to have two binary file formats, one for big-endian and one for little-endian, only one of which is usable. I would have thought that the gennorm2 program should generate the file format that works, no matter what machine it is run on. No doubt there are complex reasons for this design weirdness. I know that sadly, some people have to still deal with EBCDIC.
          rcmuir Robert Muir added a comment -

          Thanks David, that did the trick!

          I made one small change: just in case something goes wrong it uses ${build.dir} for the temp file.

          I'd like to commit this soon to trunk and 3x.

          rcmuir Robert Muir added a comment - Thanks David, that did the trick! I made one small change: just in case something goes wrong it uses ${build.dir} for the temp file. I'd like to commit this soon to trunk and 3x.
          rcmuir Robert Muir added a comment -

          I find it a strange concept to have two binary file formats, one for big-endian and one for little-endian, only one of which is usable. I would have thought that the gennorm2 program should generate the file format that works, no matter what machine it is run on.

          I could be wrong, but I think the reason ICU's data files are endian-dependent is because they are designed to be very very quickly mapped into memory
          (e.g. the speed at which the underlying character property data can be mapped into memory so that java.lang.Character becomes useful is sensitive)

          rcmuir Robert Muir added a comment - I find it a strange concept to have two binary file formats, one for big-endian and one for little-endian, only one of which is usable. I would have thought that the gennorm2 program should generate the file format that works, no matter what machine it is run on. I could be wrong, but I think the reason ICU's data files are endian-dependent is because they are designed to be very very quickly mapped into memory (e.g. the speed at which the underlying character property data can be mapped into memory so that java.lang.Character becomes useful is sensitive)
          rcmuir Robert Muir added a comment -

          Committed revision 991053 (trunk) 991055 (3x)

          Thanks David!

          rcmuir Robert Muir added a comment - Committed revision 991053 (trunk) 991055 (3x) Thanks David!

          Bulk close for 3.1

          gsingers Grant Ingersoll added a comment - Bulk close for 3.1
          tomoko Tomoko Uchida added a comment -

          This issue was moved to GitHub issue: #3703.

          tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #3703 .

          People

            rcmuir Robert Muir
            dbowen David Bowen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: