Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      The JSON spec has a very loose definition of Number. CouchDB, as a database, should have well-defined and first class support for numbers (both integral and decimal). The precision of number support should be formally specified as should the algorithm used to represent floating-point values, especially where an approximation must be made in the conversion.

        Activity

        Hide
        Paul Joseph Davis added a comment -

        So this whole thing has really gotten blown out of proportion. While we have never formally documented what's going on internally, it can be described as such:

        A number is parsed into one of two forms:

        If the number contains a decimal point (".") or an exponent ("e" or "E") then the number is internally converted into an IEEE-754 floating point representation. This means that numbers containing either a decimal point or exponent are subject to the constraints of having a finite number of bits representing the number as is standard operating procedure.

        If a number does not contain a decimal point or exponent then it is parsed as an integer with (theoretically) no loss of precision (I think precision is bound by the amount of RAM IIRC but I don't promise there aren't any bugs). (Side note for Jiffy, technically, if a number fits in a signed 64bit representation, that is used. If not then parsing is deferred back to Erlang which handles parsing as a bignum).

        Literally, the only thing that's wrong in COUCHDB-1407 is that number formatting for doubles changed a wee bit and it has a simple fix and now people are getting all crazy about numbers and ignoring other places that JSON is munged. Blergh.

        Show
        Paul Joseph Davis added a comment - So this whole thing has really gotten blown out of proportion. While we have never formally documented what's going on internally, it can be described as such: A number is parsed into one of two forms: If the number contains a decimal point (".") or an exponent ("e" or "E") then the number is internally converted into an IEEE-754 floating point representation. This means that numbers containing either a decimal point or exponent are subject to the constraints of having a finite number of bits representing the number as is standard operating procedure. If a number does not contain a decimal point or exponent then it is parsed as an integer with (theoretically) no loss of precision (I think precision is bound by the amount of RAM IIRC but I don't promise there aren't any bugs). (Side note for Jiffy, technically, if a number fits in a signed 64bit representation, that is used. If not then parsing is deferred back to Erlang which handles parsing as a bignum). Literally, the only thing that's wrong in COUCHDB-1407 is that number formatting for doubles changed a wee bit and it has a simple fix and now people are getting all crazy about numbers and ignoring other places that JSON is munged. Blergh.
        Hide
        Robert Newson added a comment -

        If all that needs to happen to resolve this ticket is to include what you just said in the documentation (and maybe some tests that prove it is, and remains, true), I'll be quite happy.

        Show
        Robert Newson added a comment - If all that needs to happen to resolve this ticket is to include what you just said in the documentation (and maybe some tests that prove it is, and remains, true), I'll be quite happy.
        Hide
        Jason Smith added a comment -

        Bob, the JSON definition of number is not loose. You have numerals, a dot, numerals, an "E", and numerals. That pretty much describes arbitrary precision decimals. Everybody keeps talking about JSON but the crucial matter is, what does Couch do with numbers; or put another way, how can we expect the JSON to change from a PUT to a GET.

        Couch is used with instrumentation and scientific applications. Significant figures matter. It would be nice Couch maintained them. But there are workarounds, so that is merely a nice-to-have.

        Simply identifying Couch's treatment of numbers is IMHO quite fine.

        Show
        Jason Smith added a comment - Bob, the JSON definition of number is not loose. You have numerals, a dot, numerals, an "E", and numerals. That pretty much describes arbitrary precision decimals. Everybody keeps talking about JSON but the crucial matter is, what does Couch do with numbers ; or put another way, how can we expect the JSON to change from a PUT to a GET. Couch is used with instrumentation and scientific applications. Significant figures matter. It would be nice Couch maintained them. But there are workarounds, so that is merely a nice-to-have. Simply identifying Couch's treatment of numbers is IMHO quite fine.
        Hide
        Robert Newson added a comment -

        I appreciate that the format is fully defined. Perhaps what I mean, instead, is the precision with which those numbers can be manipulated in view servers? I've certainly been stung by some crazy number rounding issues in the past, I don't think it's reasonable behavior for a database.

        It sounds like this ticket is really two issues, 1) numbers can roundtrip safely to and from JSON, 2) numbers can be computed with within known (and consistent) bounds.

        Issue 1 is something we need to resolve in ejson for the 1.2.0 release but sounds simple. To fulfill this ticket, we have to commit to not breaking roundtrip safety in future versions.

        Issue 2, I suspect, is contentious. Or, at least, I suspect I desire stronger numeric handling than javascript typically delivers. I'll be happy here if we document, and preserve, some minimal standard.

        /ramble

        Show
        Robert Newson added a comment - I appreciate that the format is fully defined. Perhaps what I mean, instead, is the precision with which those numbers can be manipulated in view servers? I've certainly been stung by some crazy number rounding issues in the past, I don't think it's reasonable behavior for a database. It sounds like this ticket is really two issues, 1) numbers can roundtrip safely to and from JSON, 2) numbers can be computed with within known (and consistent) bounds. Issue 1 is something we need to resolve in ejson for the 1.2.0 release but sounds simple. To fulfill this ticket, we have to commit to not breaking roundtrip safety in future versions. Issue 2, I suspect, is contentious. Or, at least, I suspect I desire stronger numeric handling than javascript typically delivers. I'll be happy here if we document, and preserve, some minimal standard. /ramble
        Hide
        Jason Smith added a comment -

        Is Javascript relevant? The storage layer reads and writes documents. Everything arrives or departs (whether to HTTP layer or to a view server) as JSON. What couchjs does with that JSON seems a different issue. What my browser does with the JSON is a different issue.

        I like Paul's point to simply document how it works and leave it at that. It's basically how most databases work. Maybe Couch 3000 would improve on that, but for today...

        Show
        Jason Smith added a comment - Is Javascript relevant? The storage layer reads and writes documents. Everything arrives or departs (whether to HTTP layer or to a view server) as JSON. What couchjs does with that JSON seems a different issue. What my browser does with the JSON is a different issue. I like Paul's point to simply document how it works and leave it at that. It's basically how most databases work. Maybe Couch 3000 would improve on that, but for today...
        Hide
        Jason Smith added a comment -

        Can this issue be merged with either of these two:

        Big numbers changed to decimals: https://issues.apache.org/jira/browse/COUCHDB-724
        JSON encoding of number changes: https://issues.apache.org/jira/browse/COUCHDB-1407

        Show
        Jason Smith added a comment - Can this issue be merged with either of these two: Big numbers changed to decimals: https://issues.apache.org/jira/browse/COUCHDB-724 JSON encoding of number changes: https://issues.apache.org/jira/browse/COUCHDB-1407
        Hide
        Robert Newson added a comment -

        Numbers are hard, let's go shopping.

        Show
        Robert Newson added a comment - Numbers are hard, let's go shopping.
        Show
        Alexander Shorin added a comment - Back reference from docs: http://docs.couchdb.org/en/latest/json-structure.html#number-handling

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Newson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development