[STDCXX-239] std::num_get::do_get() cannot parse nan, infinity - ASF JIRA

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
Fix Version/s: 4.3.0
Component/s: 22. Localization
Labels:
None
Environment:

all

Severity:
Usability

Description

Moved from the Rogue Wave bug tracking database:

***Created By: sebor @ Apr 04, 2000 07:13:59 PM***
The num_get<> facet's do_get() members fail to take the special strings [-]inf[inity] and [-]nan into account. The facet reports an error when it encounters such strings. See 7.19.6.1 and 7.19.6.2 of C99 for a list of allowed strings.

The fix for this will not be trivial due to the messy implementation of the facets. It might be easier just to rewrite them from scratch.

The testcase below demonstrates the incorrect behavior. Modified test case added as tests/regress/src/test_issue22564.cpp - see p4 describe 22408.

$ g++ ... test.cpp
$ a.out 0 1 inf infinity nan INF INFINITY NAN
sscanf("0", "%lf") --> 0.000000
num_get<>::do_get("0", ...) --> 0.000000
sscanf("1", "%lf") --> 1.000000
num_get<>::do_get("1", ...) --> 1.000000
sscanf("inf", "%lf") --> inf
num_get<>::do_get("inf", ...) --> error
sscanf("infinity", "%lf") --> inf
num_get<>::do_get("infinity", ...) --> error
sscanf("nan", "%lf") --> nan
num_get<>::do_get("nan", ...) --> error
sscanf("INF", "%lf") --> inf
num_get<>::do_get("INF", ...) --> error
sscanf("INFINITY", "%lf") --> inf
num_get<>::do_get("INFINITY", ...) --> error
sscanf("NAN", "%lf") --> nan
num_get<>::do_get("NAN", ...) --> error

$ cat test.cpp

#include <iostream>
#include <locale>
#include <stdio.h>
#include <string.h>

using namespace std;

int main (int argc, const char *argv[])
{
num_get<char, const char*> nget;

for (int i = 1; i != argc; ++i)

{ double x = 0, y = 0; ios::iostate err = ios::goodbit; nget.get (argv [i], argv [i] + strlen (argv [i]), cin, err, x); if (1 != sscanf (argv [i], "%lf", &y)) printf ("sscanf(\"%s\", \"%%lf\") --> error\n", argv [i]); else printf ("sscanf(\"%s\", \"%%lf\") --> %f\n", argv [i], y); if ((ios::failbit | ios::badbit) & err) printf ("num_get<>::do_get(\"%s\", ...) --> error\n", argv [i]); else printf ("num_get<>::do_get(\"%s\", ...) --> %f\n", argv [i], x); }

}

***Modified By: sebor @ Apr 09, 2000 09:31:49 PM***
Fixed with p4 describe 22544. Test case fixed with p4 describe 22545. Closed.

***Modified By: leroy @ Mar 30, 2001 03:09:11 PM***
Change 22544 by sebor@sebor_dev_killer on 2000/04/09 20:30:50

Added support for inf[inity] and nan[(n-char-sequence)] as described
in 7.19.6.1, p8 of C99.
nan(n-char-sequence) currently treated the same as nan due to poor
implementation of std::num_get<> and supporting classes - fix requires
at least a partial rewrite of the facet.

Resolves Onyx #22564 (and the duplicate #22601).

Affected files ...

... //stdlib2/dev/source/src/include/rw/numbrw#17 edit
... //stdlib2/dev/source/src/include/rw/numbrw.cc#12 edit
... //stdlib2/dev/source/vendor.cpp#17 edit

***Modified By: sebor @ Apr 03, 2001 08:46:50 PM***
It looks like this is actually not a bug and the fix is wrong (even as an extension). Here's some background...

Subject: Is this a permissible extension?
Date: Thu, 8 Feb 2001 18:16:18 -0500 (EST)
From: Andrew Koenig <ark@research.att.com>
Reply-To: c++std-lib@research.att.com

To: C++ libraries mailing list
Message c++std-lib-8281

Suppose we execute

double x;

std::cin >> x;

at a point where the input stream contains

NaN

followed perhaps by other characters.

One might plausibly expect an implementation to set x to NaN
on an implementation that supports IEEE floating-point.

Surely the standard cannot mandate such behavior, because not
every implementation knows what NaN is. However, on an implementation
that does support NaN, is such behavior a permitted extension?

My first attempt at an answer is no, because if I track through the
standard, I find that the behavior of this statement is defined
as being identical to the behavior of strtod in c89, and that behavior
requires at least one digit in the input in order for the intput to
be valid. However, I might have missed something. Have I?

***Modified By: sebor @ Apr 03, 2001 08:48:03 PM***
Subject: Re: Is this a permissible extension?
Date: Fri, 09 Feb 2001 09:28:25 -0800
From: Matt Austern <austern@research.att.com>
Reply-To: c++std-lib@research.att.com
Organization: AT&T Labs - Research
References: 1 , 2

To: C++ libraries mailing list
Message c++std-lib-8284

Andrew Koenig wrote:

> Fred> In "C" locale, only decimal floating-point constants are valid.
> Fred> So, no NaN nor Infinity is allowed.
>
> Yes – I was talking about the default locale.

Actually, I think that strtod isn't the important part, at least for
discussing C++. I think that this is an illegal extension in all
named locales.

First, let me explain why I said named locales. If you construct
a locale with locale("foo"), the way it works is that the locale is
built up out of _byname facets instead of base class facets. Except
that not all facets have _byname derived classes, so in some cases
you've still got the default behavior from the facet base class.

One of the facets that has no _byname variant is num_get<>. So if I
can construct an argument that the documented behavior of num_get<>
precludes this extension, I have also proved that this extension is
impossible in any named locale. This argument does not apply to
arbitrary locales, since an arbitrary locale may replace any base
class facet that with a facet that inherits from it.

OK, now the argument I promised, saying that num_get<> can't recognize
the character string "NaN".

22.2.2.1.2, paragraph 2: num_get's overloaded conversion function,
num_get::do_get(), works in three stages.

(1) It determines conversion specifiers. We're OK so far.
(2) It accumulates characters from a provided input character.
(3) It uses the conversion specifiers and the characters it has
accumulated to produce a number.

Stage 2 is the crucial one. it's described in 22.2.2.1.2/8-10, in
great detail.

For each character,
(a) We get it from a supplied input iterator.
(b) We look it up in a lookup table whose contents are prescribed
by the standard. (This has to do with wide characters, but there
is no exception for the special case where you're reading narrow
characters.)
(c) If a character is found in the lookup table, or if it's a decimal
point or a thousands sep, then it's checked to see if it can
legally appear in the number at that point. If so, we keep
acumulating characters.

The characters in the lookup table are "0123456789abcdefABCDEF+-".
Library issue 221 would amend that to "0123456789abcdefxABCDEFX+-".
"N" isn't present in the lookup table, so stage 2 of num_get<>::do_get()
is not permitted to read the character sequence "NaN".

If you want to argue that num_get<>::do_get() is overspecified, I
wouldn't disagree too violently.

--Matt

std::num_get::do_get() cannot parse nan, infinity

Details

Description

Attachments

Activity

People

Dates

Time Tracking