Issue Details (XML | Word | Printable)

Key: STDCXX-285
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Martin Sebor
Reporter: Martin Sebor
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
C++ Standard Library

localedef fails to generate multibyte characters with the same prefix

Created: 06/Sep/06 12:57 AM   Updated: 06/Sep/06 01:12 AM
Return to search
Component/s: Utilities
Affects Version/s: 4.1.2, 4.1.3
Fix Version/s: None

Time Tracking:
Not Specified

Environment: all


 Description  « Hide
The localedef utility fails to generate multibyte characters whose initial prefix (leading byte) is the same as some single-byte character. The test case below demonstrates the problem:

$ cat charmap && cat ctype && ./localedef -c -w -f charmap -i ctype /tmp/dummy && LC_ALL=/tmp/dummy ./locale --charmap
CHARMAP
<U0041> \x41
<U0141> \x41\x42
<U0241> \x41\x43
END CHARMAP
LC_CTYPE
END LC_CTYPE
<escape_char> \
<comment_char> #
<code_set_name> charmap
<mb_cur_max> 1

  1. charmap data:
  2. charmap name = charmap
  3. n_to_w_tab_off = 0
  4. w_to_n_tab_off = 1024
  5. utf8_to_ext_tab_off = 4096
  6. xliteration_off = 7168
  7. wchar_off = 8192
  8. codeset_off = 8216
  9. charmap_off = 8224
  10. codecvt_ext_off = 0

CHARMAP
<U0041> \x41 # L'\x41'
END CHARMAP

  1. charmap stats:
  2. number of tables = 1
  3. number of characters = 1
  4. number of unused slots = 255 (100% waste)


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Martin Sebor added a comment - 06/Sep/06 01:12 AM
This is an issue for the ISO-IR-90 character set which contains such sequences (e.g., <UE002> encoded as \xc1 and <U00C0> encoded as \xc1\x41) – see http://svn.apache.org/repos/asf/incubator/stdcxx/branches/4.1.3/etc/nls/charmaps/ISO-IR-90. The single-byte characters in the charmap are marked as "(not a real character)" in a comment but the utility generates entries only for them and not for the multibyte characters with that prefix. This is actually by design (dictated by the layout of the codecvt tables), but it appears to be a problem nonetheless.