Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2
    • Fix Version/s: 1.2
    • Component/s: HTTP Interface
    • Labels:
      None
    • Environment:

      Ubuntu 12.04 (alpha)

      Description

      JSON encoding of Number has changed from 1.0.2 to 1.2. JSON only defines Number but this change causes issues in my app because python decodes the number as an int in 1.2.

      Test case:

      PORT=5985

      curl -X DELETE http://localhost:$PORT/test-floats/
      curl -X PUT http://localhost:$PORT/test-floats/

      curl -X PUT http://localhost:$PORT/test-floats/doc1 -H "Content-Type: application/json" -d "

      { \"a\": 1.0 }

      "
      curl http://localhost:$PORT/test-floats/doc1

      Run against 1.0.2:

      {"ok":true} {"ok":true} {"ok":true,"id":"doc1","rev":"1-78e61304147429d3d500aee7806fd26d"} {"_id":"doc1","_rev":"1-78e61304147429d3d500aee7806fd26d","a":1.0}

      Run against 1.2:

      {"ok":true} {"ok":true} {"ok":true,"id":"doc1","rev":"1-78e61304147429d3d500aee7806fd26d"} {"_id":"doc1","_rev":"1-78e61304147429d3d500aee7806fd26d","a":1}

        Activity

        Hide
        Jason Smith added a comment -

        As far as this ticket indicates, CouchDB is neither losing precision nor altering the numeric value. It is changing the representation. Thus this is not a bug in CouchDB.

        This ticket's opening sentence makes an inaccurate assumption. JSON does not encode "Numbers" (upper-case), like the data type in Javascript. JSON encodes "numbers" (lower-case), what sensible people call real numbers. AFAIK the spec makes no assumptions about how you deserialize and represent that value in your hardware and language.

        In other words, just as 1.0 = 1 in arithmetic,

        {"a":1}

        and

        {"a":1.0}

        encode the same thing in JSON. If the numeric type is important to you, then store that in the doc. (More often, you'll just type cast.) Or maybe your JSON decoder has an option to disable DWIMming the type.

        Consider:

        1. What if Couch had returned

        {"a":"1.00}

        ? Would that be a bug? Why not? No fair saying 1.00 is the same type as 1.0 because JSON has no types, only a syntax.

        2. Is it a bug if Couch encodes strings greater than 32,767 characters long, the maximum string length in QuickBasic, beyond which there is a runtime error?

        Show
        Jason Smith added a comment - As far as this ticket indicates, CouchDB is neither losing precision nor altering the numeric value. It is changing the representation. Thus this is not a bug in CouchDB. This ticket's opening sentence makes an inaccurate assumption. JSON does not encode "Numbers" (upper-case), like the data type in Javascript. JSON encodes "numbers" (lower-case), what sensible people call real numbers. AFAIK the spec makes no assumptions about how you deserialize and represent that value in your hardware and language. In other words, just as 1.0 = 1 in arithmetic, {"a":1} and {"a":1.0} encode the same thing in JSON. If the numeric type is important to you, then store that in the doc. (More often, you'll just type cast.) Or maybe your JSON decoder has an option to disable DWIMming the type. Consider: 1. What if Couch had returned {"a":"1.00} ? Would that be a bug? Why not? No fair saying 1.00 is the same type as 1.0 because JSON has no types, only a syntax. 2. Is it a bug if Couch encodes strings greater than 32,767 characters long, the maximum string length in QuickBasic, beyond which there is a runtime error?
        Hide
        Robert Newson added a comment -

        For my part, I fully accept the notion that we've adhered to the JSON semantics for numbers. However, I think we all recognize that the JSON semantics for numbers are rubbish.

        At the very least, as I noted on the 1.2.0 voting thread, this should be recorded in 'BREAKING CHANGES' but I think we ought to fix it. I will devote some cycles to that end.

        Show
        Robert Newson added a comment - For my part, I fully accept the notion that we've adhered to the JSON semantics for numbers. However, I think we all recognize that the JSON semantics for numbers are rubbish. At the very least, as I noted on the 1.2.0 voting thread, this should be recorded in 'BREAKING CHANGES' but I think we ought to fix it. I will devote some cycles to that end.
        Hide
        Robert Newson added a comment -

        Parity is restored as easily as;

        diff --git a/src/ejson/encode.c b/src/ejson/encode.c
        index 916a0b7..134ffca 100644
        — a/src/ejson/encode.c
        +++ b/src/ejson/encode.c
        @@ -146,7 +146,7 @@ final_encode(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
        }
        // write the string into the buffer
        snprintf((char*)ctx.bin.data+ctx.fill_offset, 32,

        • "%.16g", number);
          + "%.16f", number);
          // increment the length
          ctx.fill_offset += strlen((char*)ctx.bin.data+ctx.fill_offset);
          }

        The obvious downside is that, by not taking the shorter of the %e and %f formats, numbers will be less concisely persisted.

        Show
        Robert Newson added a comment - Parity is restored as easily as; diff --git a/src/ejson/encode.c b/src/ejson/encode.c index 916a0b7..134ffca 100644 — a/src/ejson/encode.c +++ b/src/ejson/encode.c @@ -146,7 +146,7 @@ final_encode(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) } // write the string into the buffer snprintf((char*)ctx.bin.data+ctx.fill_offset, 32, "%.16g", number); + "%.16f", number); // increment the length ctx.fill_offset += strlen((char*)ctx.bin.data+ctx.fill_offset); } The obvious downside is that, by not taking the shorter of the %e and %f formats, numbers will be less concisely persisted.
        Hide
        Jason Smith added a comment -

        Bravo! I stand corrected.

        One doesn't often hear "CouchDB" and "concisely persisted" in the same sentence. Couch typically chooses correctness and simplicity.

        Show
        Jason Smith added a comment - Bravo! I stand corrected. One doesn't often hear "CouchDB" and "concisely persisted" in the same sentence. Couch typically chooses correctness and simplicity.
        Hide
        Filipe Manana added a comment -

        This is the patch I just mentioned in the development mailing list regarding this issue.

        It was only barely tested with a few cases, but the full test suite (etap and JavaScript) passes with it.

        Show
        Filipe Manana added a comment - This is the patch I just mentioned in the development mailing list regarding this issue. It was only barely tested with a few cases, but the full test suite (etap and JavaScript) passes with it.
        Hide
        Paul Joseph Davis added a comment -

        As mentioned on the dev@ thread, I'm pretty dead set against this approach. While there seems to be some sort of general consensus that storing numbers as uninterpreted bytes and repeating them back is the way to go it really misses the entirety of the issue.

        First, CouchDB has never claimed to pass numbers around while keeping byte identical representations. This patch attempts to change that drastically with a very large number of consequences that we haven't begun to investigate.

        Secondly, if we were to actually consider going this route then we'd also be obliged to start looking at every other place where we change representations internally as well.

        Thirdly, if we were to do that then we'd also have to get into all of the cases where we're stricter than JSON specifically allows and then try and address all of those issues as well.

        Basically, how about we just fix the encoder to spit out a decimal point and an appropriate amount of precision and then start documenting our round tripping limitations.

        Show
        Paul Joseph Davis added a comment - As mentioned on the dev@ thread, I'm pretty dead set against this approach. While there seems to be some sort of general consensus that storing numbers as uninterpreted bytes and repeating them back is the way to go it really misses the entirety of the issue. First, CouchDB has never claimed to pass numbers around while keeping byte identical representations. This patch attempts to change that drastically with a very large number of consequences that we haven't begun to investigate. Secondly, if we were to actually consider going this route then we'd also be obliged to start looking at every other place where we change representations internally as well. Thirdly, if we were to do that then we'd also have to get into all of the cases where we're stricter than JSON specifically allows and then try and address all of those issues as well. Basically, how about we just fix the encoder to spit out a decimal point and an appropriate amount of precision and then start documenting our round tripping limitations.
        Hide
        Kevin R. Coombes added a comment -

        I've been staying out of the fray so far, but I want to (mostly) endorse
        Paul's summary and suggestion. The hear tof my arguemnt is two simple
        points:
        [1] JSON only defines Number. It does not define separate integer and
        floating point numbers.
        [2] CouchDB promises to respond to HTTP requests to PUT and GET data,
        and the return value is documented to be JSON.

        These two points imply that CouchDB only knows about the kinds of data
        structures that JSON defines and supports, and thus can/should make no
        promises about the representation of numbers beyond what you can get
        from JSON..

        If an application depends on the distinction between integers and
        floating point values, then it is up to the person writing the
        application to make sure this distinction survives. As has already been
        pointed out, they can accomplish that goal by storing all numbers as
        (JSON) strings and using their application to decode/eval them. This
        fix requires no changes to the CouchDB code.

        I would not even change the encoder to deal with decimal points and
        precision. I would advocate just making sure that the documentation is
        clear on this point. In particular, it is probably necessary to
        document (as a breaking change that may require people to rewrite some
        of their applications) the fact that 1.2 may drop trailing zeros after
        the decimal point.

        You cannot really promise to support different types of numbers without
        radically changing the CouchDB code. You would then have to continually
        fight with JSON to get it to support something that is beyond its
        capabilities. Maintenance would become a nightmare. Let's try to avoid
        that road....

        Kevin

        Show
        Kevin R. Coombes added a comment - I've been staying out of the fray so far, but I want to (mostly) endorse Paul's summary and suggestion. The hear tof my arguemnt is two simple points: [1] JSON only defines Number. It does not define separate integer and floating point numbers. [2] CouchDB promises to respond to HTTP requests to PUT and GET data, and the return value is documented to be JSON. These two points imply that CouchDB only knows about the kinds of data structures that JSON defines and supports, and thus can/should make no promises about the representation of numbers beyond what you can get from JSON.. If an application depends on the distinction between integers and floating point values, then it is up to the person writing the application to make sure this distinction survives. As has already been pointed out, they can accomplish that goal by storing all numbers as (JSON) strings and using their application to decode/eval them. This fix requires no changes to the CouchDB code. I would not even change the encoder to deal with decimal points and precision. I would advocate just making sure that the documentation is clear on this point. In particular, it is probably necessary to document (as a breaking change that may require people to rewrite some of their applications) the fact that 1.2 may drop trailing zeros after the decimal point. You cannot really promise to support different types of numbers without radically changing the CouchDB code. You would then have to continually fight with JSON to get it to support something that is beyond its capabilities. Maintenance would become a nightmare. Let's try to avoid that road.... Kevin
        Hide
        Robert Newson added a comment -

        Inclined to agree with the general trend that we don't "fix" this. 1.0 and 1 are the same value, libraries that ignore the consequences of JSON's definition of number are broken and should be fixed.

        I'd like to resolve as "Won't Fix" and updated BREAKING CHANGES for 1.2.0 to reflect this decision.

        Show
        Robert Newson added a comment - Inclined to agree with the general trend that we don't "fix" this. 1.0 and 1 are the same value, libraries that ignore the consequences of JSON's definition of number are broken and should be fixed. I'd like to resolve as "Won't Fix" and updated BREAKING CHANGES for 1.2.0 to reflect this decision.
        Hide
        Kevin R. Coombes added a comment -

        +1.0

        Show
        Kevin R. Coombes added a comment - +1.0
        Hide
        Paul Joseph Davis added a comment -

        I'll get to this today. It shouldn't be that hard.

        Show
        Paul Joseph Davis added a comment - I'll get to this today. It shouldn't be that hard.
        Hide
        Paul Joseph Davis added a comment -

        Sorry it took so long for me to get to this.

        Patch is on 1.2.x and master:

        http://git-wip-us.apache.org/repos/asf?p=couchdb.git;a=commitdiff;h=ba271a70b83c6df16af43204c2ba9f4d5ca89711

        make check passes locally and a shell session shows it to be correct.

        Show
        Paul Joseph Davis added a comment - Sorry it took so long for me to get to this. Patch is on 1.2.x and master: http://git-wip-us.apache.org/repos/asf?p=couchdb.git;a=commitdiff;h=ba271a70b83c6df16af43204c2ba9f4d5ca89711 make check passes locally and a shell session shows it to be correct.

          People

          • Assignee:
            Unassigned
            Reporter:
            Adam Lofts
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development