Uploaded image for project: 'C++ Standard Library'
  1. C++ Standard Library
  2. STDCXX-239

std::num_get::do_get() cannot parse nan, infinity

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
    • Fix Version/s: 4.3.0
    • Component/s: 22. Localization
    • Labels:
      None
    • Environment:

      all

    • Severity:
      Usability

      Description

      Moved from the Rogue Wave bug tracking database:

      ***Created By: sebor @ Apr 04, 2000 07:13:59 PM***
      The num_get<> facet's do_get() members fail to take the special strings [-]inf[inity] and [-]nan into account. The facet reports an error when it encounters such strings. See 7.19.6.1 and 7.19.6.2 of C99 for a list of allowed strings.

      The fix for this will not be trivial due to the messy implementation of the facets. It might be easier just to rewrite them from scratch.

      The testcase below demonstrates the incorrect behavior. Modified test case added as tests/regress/src/test_issue22564.cpp - see p4 describe 22408.

      $ g++ ... test.cpp
      $ a.out 0 1 inf infinity nan INF INFINITY NAN
      sscanf("0", "%lf") --> 0.000000
      num_get<>::do_get("0", ...) --> 0.000000
      sscanf("1", "%lf") --> 1.000000
      num_get<>::do_get("1", ...) --> 1.000000
      sscanf("inf", "%lf") --> inf
      num_get<>::do_get("inf", ...) --> error
      sscanf("infinity", "%lf") --> inf
      num_get<>::do_get("infinity", ...) --> error
      sscanf("nan", "%lf") --> nan
      num_get<>::do_get("nan", ...) --> error
      sscanf("INF", "%lf") --> inf
      num_get<>::do_get("INF", ...) --> error
      sscanf("INFINITY", "%lf") --> inf
      num_get<>::do_get("INFINITY", ...) --> error
      sscanf("NAN", "%lf") --> nan
      num_get<>::do_get("NAN", ...) --> error

      $ cat test.cpp

      #include <iostream>
      #include <locale>
      #include <stdio.h>
      #include <string.h>

      using namespace std;

      int main (int argc, const char *argv[])
      {
      num_get<char, const char*> nget;

      for (int i = 1; i != argc; ++i)

      { double x = 0, y = 0; ios::iostate err = ios::goodbit; nget.get (argv [i], argv [i] + strlen (argv [i]), cin, err, x); if (1 != sscanf (argv [i], "%lf", &y)) printf ("sscanf(\"%s\", \"%%lf\") --> error\n", argv [i]); else printf ("sscanf(\"%s\", \"%%lf\") --> %f\n", argv [i], y); if ((ios::failbit | ios::badbit) & err) printf ("num_get<>::do_get(\"%s\", ...) --> error\n", argv [i]); else printf ("num_get<>::do_get(\"%s\", ...) --> %f\n", argv [i], x); }

      }

      ***Modified By: sebor @ Apr 09, 2000 09:31:49 PM***
      Fixed with p4 describe 22544. Test case fixed with p4 describe 22545. Closed.

      ***Modified By: leroy @ Mar 30, 2001 03:09:11 PM***
      Change 22544 by sebor@sebor_dev_killer on 2000/04/09 20:30:50

      Added support for inf[inity] and nan[(n-char-sequence)] as described
      in 7.19.6.1, p8 of C99.
      nan(n-char-sequence) currently treated the same as nan due to poor
      implementation of std::num_get<> and supporting classes - fix requires
      at least a partial rewrite of the facet.

      Resolves Onyx #22564 (and the duplicate #22601).

      Affected files ...

      ... //stdlib2/dev/source/src/include/rw/numbrw#17 edit
      ... //stdlib2/dev/source/src/include/rw/numbrw.cc#12 edit
      ... //stdlib2/dev/source/vendor.cpp#17 edit

      ***Modified By: sebor @ Apr 03, 2001 08:46:50 PM***
      It looks like this is actually not a bug and the fix is wrong (even as an extension). Here's some background...

      Subject: Is this a permissible extension?
      Date: Thu, 8 Feb 2001 18:16:18 -0500 (EST)
      From: Andrew Koenig <ark@research.att.com>
      Reply-To: c++std-lib@research.att.com

      To: C++ libraries mailing list
      Message c++std-lib-8281

      Suppose we execute

      double x;

      std::cin >> x;

      at a point where the input stream contains

      NaN

      followed perhaps by other characters.

      One might plausibly expect an implementation to set x to NaN
      on an implementation that supports IEEE floating-point.

      Surely the standard cannot mandate such behavior, because not
      every implementation knows what NaN is. However, on an implementation
      that does support NaN, is such behavior a permitted extension?

      My first attempt at an answer is no, because if I track through the
      standard, I find that the behavior of this statement is defined
      as being identical to the behavior of strtod in c89, and that behavior
      requires at least one digit in the input in order for the intput to
      be valid. However, I might have missed something. Have I?

      ***Modified By: sebor @ Apr 03, 2001 08:48:03 PM***
      Subject: Re: Is this a permissible extension?
      Date: Fri, 09 Feb 2001 09:28:25 -0800
      From: Matt Austern <austern@research.att.com>
      Reply-To: c++std-lib@research.att.com
      Organization: AT&T Labs - Research
      References: 1 , 2

      To: C++ libraries mailing list
      Message c++std-lib-8284

      Andrew Koenig wrote:

      > Fred> In "C" locale, only decimal floating-point constants are valid.
      > Fred> So, no NaN nor Infinity is allowed.
      >
      > Yes – I was talking about the default locale.

      Actually, I think that strtod isn't the important part, at least for
      discussing C++. I think that this is an illegal extension in all
      named locales.

      First, let me explain why I said named locales. If you construct
      a locale with locale("foo"), the way it works is that the locale is
      built up out of _byname facets instead of base class facets. Except
      that not all facets have _byname derived classes, so in some cases
      you've still got the default behavior from the facet base class.

      One of the facets that has no _byname variant is num_get<>. So if I
      can construct an argument that the documented behavior of num_get<>
      precludes this extension, I have also proved that this extension is
      impossible in any named locale. This argument does not apply to
      arbitrary locales, since an arbitrary locale may replace any base
      class facet that with a facet that inherits from it.

      OK, now the argument I promised, saying that num_get<> can't recognize
      the character string "NaN".

      22.2.2.1.2, paragraph 2: num_get's overloaded conversion function,
      num_get::do_get(), works in three stages.

      (1) It determines conversion specifiers. We're OK so far.
      (2) It accumulates characters from a provided input character.
      (3) It uses the conversion specifiers and the characters it has
      accumulated to produce a number.

      Stage 2 is the crucial one. it's described in 22.2.2.1.2/8-10, in
      great detail.

      For each character,
      (a) We get it from a supplied input iterator.
      (b) We look it up in a lookup table whose contents are prescribed
      by the standard. (This has to do with wide characters, but there
      is no exception for the special case where you're reading narrow
      characters.)
      (c) If a character is found in the lookup table, or if it's a decimal
      point or a thousands sep, then it's checked to see if it can
      legally appear in the number at that point. If so, we keep
      acumulating characters.

      The characters in the lookup table are "0123456789abcdefABCDEF+-".
      Library issue 221 would amend that to "0123456789abcdefxABCDEFX+-".
      "N" isn't present in the lookup table, so stage 2 of num_get<>::do_get()
      is not permitted to read the character sequence "NaN".

      If you want to argue that num_get<>::do_get() is overspecified, I
      wouldn't disagree too violently.

      --Matt

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sebor Martin Sebor
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 16h
                16h
                Remaining:
                Remaining Estimate - 16h
                16h
                Logged:
                Time Spent - Not Specified
                Not Specified