Issue Details (XML | Word | Printable)

Key: STDCXX-435
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: Martin Sebor
Reporter: Mark Brown
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
C++ Standard Library

[Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected

Created: 04/Jun/07 04:55 AM   Updated: 17/Apr/08 02:33 PM
Return to search
Component/s: 22. Localization
Affects Version/s: 4.1.3, 4.1.4, 4.2.0
Fix Version/s: 4.2.1

Time Tracking:
Original Estimate: 8h
Original Estimate - 8h
Remaining Estimate: 2h
Time Spent - 6h Remaining Estimate - 2h
Time Spent: 6h
Time Spent - 6h Remaining Estimate - 2h

File Attachments:
  Size
Text File stdcxx-435.patch 2008-03-12 05:08 PM Eric Lemings 5 kB
Environment: gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)
Issue Links:
dependent
 

Severity: Incorrect Behavior
Resolution Date: 12/Mar/08 11:03 PM


 Description  « Hide
When compiled with gcc 4.1.1 on Linux the program below runs successfully to completion as it should. When compiled with stdcxx the facet returns a to_next value that is greater than the number of internal (wchar_t) characters actually produced by the conversion and consequently the program aborts.
$ cat t.cpp && make t && ./t
#include <cassert>
#include <cwchar>
#include <locale>

int main ()
{
    const std::locale utf8 ("en_US.UTF-8");
    typedef std::codecvt<wchar_t, char, std::mbstate_t> UTF8_Cvt;

    const UTF8_Cvt &cvt = std::use_facet<UTF8_Cvt>(utf8);

    const char src[] = "abc";
    wchar_t dst [2] = { L'\0' };

    const char* from_next;

    wchar_t* to_next;

    std::mbstate_t state = std::mbstate_t ();

    const std::codecvt_base::result res =
        cvt.in (state,
                src, src + 1, from_next,
                dst, dst + 2, to_next);

    assert (1 == from_next - src);
    assert (1 == to_next - dst);
    assert ('a' == dst [0]);
}

gcc -c -I/home/mbrown/stdcxx/include/ansi -D_RWSTDDEBUG    -I/home/mbrown/stdcxx/include -I/build/mbrown/stdcxx-gcc-4.1.1-11S/include -I/home/mbrown/stdcxx/examples/include  -pedantic -nostdinc++ -g   -W -Wall -Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align   t.cpp
t.cpp: In function 'int main()':
t.cpp:21: warning: unused variable 'res'
gcc t.o -o t  -L/build/mbrown/stdcxx-gcc-4.1.1-11S/lib  -lstd11S -lsupc++ -lm 
t: t.cpp:26: int main(): Assertion `1 == from_next - src' failed.
Aborted


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Martin Sebor added a comment - 04/Jun/07 11:42 PM
This bug is actually what's behind the problem described in STDCXX-333.

Martin Sebor added a comment - 04/Jun/07 11:48 PM
The problem seems to be caused by the fact that in libc mode (i.e., when using the underlying C library) codecvt_byname calls (via __rw_libc_do_in) mbsrtowcs() to convert the source sequence without bothering to make sure it's NUL-terminated. The function attempts to convert the source sequence up until the terminating NUL (or an invalid byte) or until it has produced the requested number of destitation characters. When the destination buffer is large enough for more the number of characters in the source sequence the function just keeps converting past the end.

Martin Sebor added a comment - 04/Jun/07 11:49 PM
This is critical because it affects all UTF-8 files. Scheduled for 4.2.0.

Martin Sebor added a comment - 10/Oct/07 07:04 PM
It's too late to do this in time for 4.2.0. Rescheduled for 4.2.1.

Martin Sebor added a comment - 23/Jan/08 05:36 AM
Added a code tag.

Martin Sebor added a comment - 23/Jan/08 05:38 AM
Added an ending code tag.

Eric Lemings added a comment - 12/Mar/08 05:08 PM
Martin's patch for this issue.

Martin Sebor added a comment - 12/Mar/08 11:03 PM
Patch applied in r636534.
Regression test committed in r636553.

Marking as Resolved.
Will merge out to 4.2.1 and close after we confirm that everything looks good in nightly builds.


Farid Zaripov added a comment - 17/Apr/08 10:15 AM
Fix and regression test are merged in 4.2.x branch thus: http://svn.apache.org/viewvc?view=rev&revision=648752

Martin Sebor added a comment - 17/Apr/08 02:31 PM
Added 4.1.3 and 4.2.0 to Affects Version/s.