CouchDB
  1. CouchDB
  2. COUCHDB-604

_changes feed with ?feed=continuous does not return valid JSON

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 0.10
    • Fix Version/s: 2.0.0
    • Component/s: HTTP Interface
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      When using the _changes interface via ?feed=continuous the JSON returned is rather

      a stream of JSON documents than a valid JSON file itself:

      {"seq":38,"id":"f473fe61a8a53778d91c38b23ed6e20f","changes":[

      {"rev":"9-d3e71c7f5f991b26fe014d884a27087f"}

      ]}
      {"seq":68,"id":"2a574814d61d9ec8a0ebbf43fa03d75b","changes":[

      {"rev":"6-67179f215e42d63092dc6b2199a3bf51"}

      ],"deleted":true}
      {"seq":70,"id":"75dbdacca8e475f5909e3cc298905ef8","changes":[

      {"rev":"1-0dee261a2bd4c7fb7f2abd811974d3f8"}

      ]}
      {"seq":71,"id":"09fb03236f80ea0680a3909c2d788e43","changes":[

      {"rev":"1-a9646389608c13a5c26f4c14c6863753"}

      ]}

      to be valid there needs to be a root element (and then an array with commata) like in the non-continuous feed:

      {"results":[
      {"seq":38,"id":"f473fe61a8a53778d91c38b23ed6e20f","changes":[

      {"rev":"9-d3e71c7f5f991b26fe014d884a27087f"}

      ]},
      {"seq":68,"id":"2a574814d61d9ec8a0ebbf43fa03d75b","changes":[

      {"rev":"6-67179f215e42d63092dc6b2199a3bf51"}

      ],"deleted":true},
      {"seq":70,"id":"75dbdacca8e475f5909e3cc298905ef8","changes":[

      {"rev":"1-0dee261a2bd4c7fb7f2abd811974d3f8"}

      ]},
      {"seq":71,"id":"09fb03236f80ea0680a3909c2d788e43","changes":[

      {"rev":"1-a9646389608c13a5c26f4c14c6863753"}

      ]},

      in short this means that if someone does not parse the change events in an object like manner (e.g. waiting for a line-ending and then parsing the line), but using a SAX-like parser (throwing events of each new object, etc.) and expecting the response to be JSON (which it is not, because its not {x:[{},{},{}]} but {}{}{} which is not valid) there is an error thrown.

      I can see, that people doing this line by line might be okay with the above approach, but the response is not valid JSON and it would be nice if there were a flag to make the response valid JSON.

        Activity

        Hide
        Benoit Chesneau added a comment -

        I'm not sure to follow, each line is a valid json. Purpose of continuous feed is to get changes line by line, so where is the problem?

        Show
        Benoit Chesneau added a comment - I'm not sure to follow, each line is a valid json. Purpose of continuous feed is to get changes line by line, so where is the problem?
        Hide
        Joscha Feth added a comment -

        Exactly. Each line is valid JSON, but the response itself is not valid JSON.

        Now to read the response line by line, a method is needed which reads (from the never ending feed) until a newline comes and then parses the JSON object. Now to do this, the reading method needs to block (as it does not know whether the next character is a new line or not). Now if you take JSON as a structured format, it is much more easy and suitable (because there is no additional information about newlines, etc. needed) if such a feed is read in a SAX-like manner, which throws events whenever there are new elements within the feed:

        Starting seq
        Starting id
        Starting changes
        Starting rev
        Closing rev
        ...

        but such a parser needs the document (or better the stream) to be valid JSON, which the current ouptut of the cont. feed is not.

        Show
        Joscha Feth added a comment - Exactly. Each line is valid JSON, but the response itself is not valid JSON. Now to read the response line by line, a method is needed which reads (from the never ending feed) until a newline comes and then parses the JSON object. Now to do this, the reading method needs to block (as it does not know whether the next character is a new line or not). Now if you take JSON as a structured format, it is much more easy and suitable (because there is no additional information about newlines, etc. needed) if such a feed is read in a SAX-like manner, which throws events whenever there are new elements within the feed: Starting seq Starting id Starting changes Starting rev Closing rev ... but such a parser needs the document (or better the stream) to be valid JSON, which the current ouptut of the cont. feed is not.
        Hide
        Robert Newson added a comment -

        The feed format used to be as you wish it, and still is for non-continuous mode. Consider the uses of continuous mode, though, and this makes sense. Continuous mode is for continuous replication and for external indexers (e.g, couchdb-lucene but also many others). If the response is a single JSON object, then those users would have to wait for the end of the response before they can parse the result, and there is no end to a continuous feed in general.

        If there exists a SAX-style json parser, then that might be another approach, but I'm not aware of one.

        This programming model seems so trivial that even if one did exist, I can't imagine it would be easier;

        while (read(line) != EOF)

        { change=parse(line); apply(change); }
        Show
        Robert Newson added a comment - The feed format used to be as you wish it, and still is for non-continuous mode. Consider the uses of continuous mode, though, and this makes sense. Continuous mode is for continuous replication and for external indexers (e.g, couchdb-lucene but also many others). If the response is a single JSON object, then those users would have to wait for the end of the response before they can parse the result, and there is no end to a continuous feed in general. If there exists a SAX-style json parser, then that might be another approach, but I'm not aware of one. This programming model seems so trivial that even if one did exist, I can't imagine it would be easier; while (read(line) != EOF) { change=parse(line); apply(change); }
        Hide
        Robert Newson added a comment -

        Also, since the continuous format is deliberately not "valid JSON" in the manner described, this is not a bug but a feature request.

        Show
        Robert Newson added a comment - Also, since the continuous format is deliberately not "valid JSON" in the manner described, this is not a bug but a feature request.
        Hide
        Benoit Chesneau added a comment -

        You want to use feed=longpoll if you want a full json response. For continuous mode, on application side yiou just need to wait data recv on the socket. Some example in python here :

        http://bitbucket.org/benoitc/couchdbkit/src/tip/couchdbkit/consumer.py

        As you see this is just a select on the socket waiting its state change (ie get data). You could do the same for any language. Then any line you get is a perfect valid json.

        Show
        Benoit Chesneau added a comment - You want to use feed=longpoll if you want a full json response. For continuous mode, on application side yiou just need to wait data recv on the socket. Some example in python here : http://bitbucket.org/benoitc/couchdbkit/src/tip/couchdbkit/consumer.py As you see this is just a select on the socket waiting its state change (ie get data). You could do the same for any language. Then any line you get is a perfect valid json.
        Hide
        Joscha Feth added a comment -

        The programming paradigm you propose (while(read(line)) has a blocking problem, e.g. the read call never returns if there is no new data on the stream.
        Think of the following scenario:

        You have a _changes feed with a filter applied which lets changes from specific documents pass, e.g.

        _changes?feed=continuous&listenOnly=f473fe61a8a53778d91c38b23ed6e20f

        now what if your application is interested in different IDs over the time

        _changes?feed=continuous&listenOnly=abc,xyz

        The answer is: you can't change the source while still reading on it (as the first option to cleanly exit the while loop is returning/breaking within its body).
        Now to solve this problem in Java for example you can use java.nio which allows you to have a selector on the socket which will return once there is data to read available.

        Show
        Joscha Feth added a comment - The programming paradigm you propose (while(read(line)) has a blocking problem, e.g. the read call never returns if there is no new data on the stream. Think of the following scenario: You have a _changes feed with a filter applied which lets changes from specific documents pass, e.g. _changes?feed=continuous&listenOnly=f473fe61a8a53778d91c38b23ed6e20f now what if your application is interested in different IDs over the time _changes?feed=continuous&listenOnly=abc,xyz The answer is: you can't change the source while still reading on it (as the first option to cleanly exit the while loop is returning/breaking within its body). Now to solve this problem in Java for example you can use java.nio which allows you to have a selector on the socket which will return once there is data to read available.
        Hide
        Robert Newson added a comment -

        I use _changes?heartbeat=5000&feed=continuous, so I get a response at least every 5 seconds, even if no documents change.

        See http://github.com/rnewson/couchdb-lucene/blob/master/src/main/java/com/github/rnewson/couchdb/lucene/ViewIndexer.java handleResponse() method for a working code example.

        Show
        Robert Newson added a comment - I use _changes?heartbeat=5000&feed=continuous, so I get a response at least every 5 seconds, even if no documents change. See http://github.com/rnewson/couchdb-lucene/blob/master/src/main/java/com/github/rnewson/couchdb/lucene/ViewIndexer.java handleResponse() method for a working code example.
        Hide
        Benoit Chesneau added a comment -

        2 solutions :

        1) Use select on the socket which return the answer which is what I do for consumer, and just stop listening on this socket when you don't need it anymore

        2) or wait for each line and make a a new request each time an answer is coming which is basically longpolling.

        Show
        Benoit Chesneau added a comment - 2 solutions : 1) Use select on the socket which return the answer which is what I do for consumer, and just stop listening on this socket when you don't need it anymore 2) or wait for each line and make a a new request each time an answer is coming which is basically longpolling.
        Hide
        Robert Newson added a comment -

        Again, though, the typical use of _changes is to keep some other system up to date with all changes (another database for replication, a full-text index for couchdb-lucene, etc).

        Show
        Robert Newson added a comment - Again, though, the typical use of _changes is to keep some other system up to date with all changes (another database for replication, a full-text index for couchdb-lucene, etc).
        Hide
        Joscha Feth added a comment -

        Regarding the

        "If there exists a SAX-style json parser, then that might be another approach, but I'm not aware of one."

        just that it does not exist "in the wild" it does not neccessarily mean there isn't one - we have a JSON Push Parser which can be compared to a SAX-like parser - it does not get a reader, but uses a writer - and once data is written on that writer, the parser will do it's work (and emit the above named events).
        This Push Parser has now an extension to handle continous writes of distinct JSON objects ({}{}{}...) like the _changes feed delivers in continuous mode, but it might be different with a different parser out there, which expects the feed to return valid JSON data.
        That there is no end does not mean, that there might not be a parser which can not work with the data.

        Start Document
        Starting seq
        Starting id
        Starting changes
        Starting rev
        Closing rev
        ...
        ... <-- lots of changes
        ...
        long time later
        End Document <-- timeout of _changes feed happens here

        By the way, XMPP works the same - it basically has an infinitely long stream of XML elements flowing - but still starts with a root node and ends with a closing one to be valid XML.

        You might call it a feature request, but I think either the output should be valid JSON or not. If you tell me the output is not valid JSON, okay, but I couldn't read this from the docs, as all other _changes interfaces return valid JSON.

        Show
        Joscha Feth added a comment - Regarding the "If there exists a SAX-style json parser, then that might be another approach, but I'm not aware of one." just that it does not exist "in the wild" it does not neccessarily mean there isn't one - we have a JSON Push Parser which can be compared to a SAX-like parser - it does not get a reader, but uses a writer - and once data is written on that writer, the parser will do it's work (and emit the above named events). This Push Parser has now an extension to handle continous writes of distinct JSON objects ({}{}{}...) like the _changes feed delivers in continuous mode, but it might be different with a different parser out there, which expects the feed to return valid JSON data. That there is no end does not mean, that there might not be a parser which can not work with the data. Start Document Starting seq Starting id Starting changes Starting rev Closing rev ... ... <-- lots of changes ... long time later End Document <-- timeout of _changes feed happens here By the way, XMPP works the same - it basically has an infinitely long stream of XML elements flowing - but still starts with a root node and ends with a closing one to be valid XML. You might call it a feature request, but I think either the output should be valid JSON or not. If you tell me the output is not valid JSON, okay, but I couldn't read this from the docs, as all other _changes interfaces return valid JSON.
        Hide
        Joscha Feth added a comment -

        Using &heartbeat=1000 was also my first approach, but it just is not as performant to send a newline every 1000ms just to keep a while loop running.

        Show
        Joscha Feth added a comment - Using &heartbeat=1000 was also my first approach, but it just is not as performant to send a newline every 1000ms just to keep a while loop running.
        Hide
        Benoit Chesneau added a comment -

        Obviously you can't have a valid json response on a continuous feed if you don't read line by line. Again what you want is using longpolling which send a valid response: wait until there is a change, close connection.

        For long timeout you can use heartbeat=true, when send an empty line regurlaly (default timeout). You can also use timeout option instead of heartbeat.

        Show
        Benoit Chesneau added a comment - Obviously you can't have a valid json response on a continuous feed if you don't read line by line. Again what you want is using longpolling which send a valid response: wait until there is a change, close connection. For long timeout you can use heartbeat=true, when send an empty line regurlaly (default timeout). You can also use timeout option instead of heartbeat.
        Hide
        Joscha Feth added a comment -

        Thats exactly what I am talking about: the response is not valid JSON.

        It is:

        {}
        {}
        {}
        ...

        (not valid JSON)
        where it should be

        {root:[
        {},
        {},
        {},
        ...
        ]} <-- timeout happens here

        (valid JSON)

        And I don't want to use longpolling, as this means I need to reconnect after every change.
        What I want to use is ?feed=continuous (so I can track multiple changes with one HTTP request) with &timeout=X (and NOT heartbeat=Y, which would just waste bandwidth) AND a valid JSON response which is parseable.

        So if you think parsing line by line is okay, thats fine, but for a SAX-based parser a newline outside an element is just white space - let me change this into a feature request for an additional REST parameter on the _changes interface - let's say
        &encapsulated=true|false
        which is false by default and if true wraps the response into a

        {"results":[
        ...
        ],"last_seq":0}}

        and delimits the different changes elements by comma.

        Show
        Joscha Feth added a comment - Thats exactly what I am talking about: the response is not valid JSON. It is: {} {} {} ... (not valid JSON) where it should be {root:[ {}, {}, {}, ... ]} <-- timeout happens here (valid JSON) And I don't want to use longpolling, as this means I need to reconnect after every change. What I want to use is ?feed=continuous (so I can track multiple changes with one HTTP request) with &timeout=X (and NOT heartbeat=Y, which would just waste bandwidth) AND a valid JSON response which is parseable. So if you think parsing line by line is okay, thats fine, but for a SAX-based parser a newline outside an element is just white space - let me change this into a feature request for an additional REST parameter on the _changes interface - let's say &encapsulated=true|false which is false by default and if true wraps the response into a {"results":[ ... ],"last_seq":0}} and delimits the different changes elements by comma.
        Hide
        Benoit Chesneau added a comment -

        if you listen the changes why not making a valid json while you listening ie : save line, add comma, continue .

        I prefer to have a valid json line by line rather than having to remove ending coma each time i want to use this line.

        Show
        Benoit Chesneau added a comment - if you listen the changes why not making a valid json while you listening ie : save line, add comma, continue . I prefer to have a valid json line by line rather than having to remove ending coma each time i want to use this line.
        Hide
        Joscha Feth added a comment -

        I am not reading in lines, I am reading whatever gets returned into the buffer and then writing it to the Push Parser, and yes, indeed, my first fix for this was writing a {"results":[ on the Parser and the transforming every \n into a comma. But it just does not seem right - CouchDB should return (or at least should be able to return) a valid JSON document itself.
        Anyways I am noticing you are not with me here and it seems odd that I need to argue about adding a flag to make a return valid which is currently coming back invalid.
        We have written enough, so I am sure someone having the same issues will find this conversation and may reopen this defect. Then he or she can try convincng you again.

        Show
        Joscha Feth added a comment - I am not reading in lines, I am reading whatever gets returned into the buffer and then writing it to the Push Parser, and yes, indeed, my first fix for this was writing a {"results":[ on the Parser and the transforming every \n into a comma. But it just does not seem right - CouchDB should return (or at least should be able to return) a valid JSON document itself. Anyways I am noticing you are not with me here and it seems odd that I need to argue about adding a flag to make a return valid which is currently coming back invalid. We have written enough, so I am sure someone having the same issues will find this conversation and may reopen this defect. Then he or she can try convincng you again.
        Hide
        Brian Candler added a comment -

        > By the way, XMPP works the same - it basically has an infinitely long stream of XML elements
        > flowing - but still starts with a root node and ends with a closing one to be valid XML.

        Stream parsing is much more common in the XML world; unfortunately JSON stream parsers are not (yet?) widespread.

        > You might call it a feature request, but I think either the output should be valid JSON or not.
        > If you tell me the output is not valid JSON, okay, but I couldn't read this from the docs, as all
        > other _changes interfaces return valid JSON.

        It's not valid JSON, and I would agree it's a bug if you're referring to the Content-Type header which is returned:

        $ curl -v -H "Accept: application/json" http://127.0.0.1:5984/test_suite_db/_changes?feed=continuous
        ...
        < HTTP/1.1 200 OK
        < Transfer-Encoding: chunked
        < Server: CouchDB/0.11.0a813819 (Erlang OTP/R12B)
        < Date: Mon, 21 Dec 2009 13:35:52 GMT
        < Content-Type: application/json
        < Cache-Control: must-revalidate
        <

        I would also agree that it's inconsistent with the way a view is returned. Views are complete JSON docs, but have newlines in magic places to make it possible to parse them line-at-a-time.

        $ curl -H "Accept: application/json" http://127.0.0.1:5984/test_suite_db_a/_all_docs
        {"total_rows":5,"offset":0,"rows":[
        {"id":"6d85c1ca41bd9eb4435b2fcac3670b84","key":"6d85c1ca41bd9eb4435b2fcac3670b84","value":{"rev":"1-d4388236888cf366f37777e888221968"}},
        {"id":"927c96f86e2fdebbcae3520ac2054fbd","key":"927c96f86e2fdebbcae3520ac2054fbd","value":{"rev":"1-d4388236888cf366f37777e888221968"}},
        {"id":"bar","key":"bar","value":{"rev":"6-7c014f6deb5cd9935625a8f411f8db08"}},
        {"id":"c134ca85221694a28f9ad8953b3087a7","key":"c134ca85221694a28f9ad8953b3087a7","value":{"rev":"1-d4388236888cf366f37777e888221968"}},
        {"id":"foo","key":"foo","value":{"rev":"21-b94c260d79b638231eb14b3de8458c2f"}}
        ]}

        It would have been possible for feed=continuous to work this way too.

        In that case I'd have thought the comma should appear at the start of each line (apart from the first, obviously), otherwise you need a dummy record at the end to close the stream cleanly. And if you want the heartbeat feature you'll still need a dummy record.

        Show
        Brian Candler added a comment - > By the way, XMPP works the same - it basically has an infinitely long stream of XML elements > flowing - but still starts with a root node and ends with a closing one to be valid XML. Stream parsing is much more common in the XML world; unfortunately JSON stream parsers are not (yet?) widespread. > You might call it a feature request, but I think either the output should be valid JSON or not. > If you tell me the output is not valid JSON, okay, but I couldn't read this from the docs, as all > other _changes interfaces return valid JSON. It's not valid JSON, and I would agree it's a bug if you're referring to the Content-Type header which is returned: $ curl -v -H "Accept: application/json" http://127.0.0.1:5984/test_suite_db/_changes?feed=continuous ... < HTTP/1.1 200 OK < Transfer-Encoding: chunked < Server: CouchDB/0.11.0a813819 (Erlang OTP/R12B) < Date: Mon, 21 Dec 2009 13:35:52 GMT < Content-Type: application/json < Cache-Control: must-revalidate < I would also agree that it's inconsistent with the way a view is returned. Views are complete JSON docs, but have newlines in magic places to make it possible to parse them line-at-a-time. $ curl -H "Accept: application/json" http://127.0.0.1:5984/test_suite_db_a/_all_docs {"total_rows":5,"offset":0,"rows":[ {"id":"6d85c1ca41bd9eb4435b2fcac3670b84","key":"6d85c1ca41bd9eb4435b2fcac3670b84","value":{"rev":"1-d4388236888cf366f37777e888221968"}}, {"id":"927c96f86e2fdebbcae3520ac2054fbd","key":"927c96f86e2fdebbcae3520ac2054fbd","value":{"rev":"1-d4388236888cf366f37777e888221968"}}, {"id":"bar","key":"bar","value":{"rev":"6-7c014f6deb5cd9935625a8f411f8db08"}}, {"id":"c134ca85221694a28f9ad8953b3087a7","key":"c134ca85221694a28f9ad8953b3087a7","value":{"rev":"1-d4388236888cf366f37777e888221968"}}, {"id":"foo","key":"foo","value":{"rev":"21-b94c260d79b638231eb14b3de8458c2f"}} ]} It would have been possible for feed=continuous to work this way too. In that case I'd have thought the comma should appear at the start of each line (apart from the first, obviously), otherwise you need a dummy record at the end to close the stream cleanly. And if you want the heartbeat feature you'll still need a dummy record.
        Hide
        Brian Candler added a comment -

        > if you want the heartbeat feature you'll still need a dummy record.

        A simple newline would still be fine as a heartbeat for 'gets' readers of course. What I meant was, if you are using a stream parser you probably won't get an event for just reading a newline, so heartbeat would have to send some real content to be useful in that case.

        Show
        Brian Candler added a comment - > if you want the heartbeat feature you'll still need a dummy record. A simple newline would still be fine as a heartbeat for 'gets' readers of course. What I meant was, if you are using a stream parser you probably won't get an event for just reading a newline, so heartbeat would have to send some real content to be useful in that case.
        Hide
        Joscha Feth added a comment -

        I don't need the heartbeat feature, as I use a non-blocking method to read the stream - what I want is simply an option to make CouchDB return valid JSON on the _changes continuous feed
        I agree with the commata by the way - they definitely need to be at the beginning of the next element.

        Show
        Joscha Feth added a comment - I don't need the heartbeat feature, as I use a non-blocking method to read the stream - what I want is simply an option to make CouchDB return valid JSON on the _changes continuous feed I agree with the commata by the way - they definitely need to be at the beginning of the next element.
        Hide
        Robert Newson added a comment -

        continuous mode used to return valid JSON, and clients were expected to clip out valid lines (using the newline hints) if they wanted to parse each change as it happened. It was deliberately changed to this behavior to allow clients to more easily handle _changes, since all consumers of _changes at that time needed to do the same logic.

        Changing it back will make it "correct" but harder to use (every consumer will have to add hackish logic to parse part of the response). Since the request format is predicated on "Sax-style json parsers" which apparently don't exist, it feels like purism for its own sake, practicality taking a step down.

        Perhaps it suffices to change the returned content-type to multipart/related where each part is a valid JSON response?

        That said, I would hope it suffices to document the returned format for continuous mode with the expectation that it will be the preferred format for developers that write external indexers (which is the reason it was changed to the current format in the first place).

        Show
        Robert Newson added a comment - continuous mode used to return valid JSON, and clients were expected to clip out valid lines (using the newline hints) if they wanted to parse each change as it happened. It was deliberately changed to this behavior to allow clients to more easily handle _changes, since all consumers of _changes at that time needed to do the same logic. Changing it back will make it "correct" but harder to use (every consumer will have to add hackish logic to parse part of the response). Since the request format is predicated on "Sax-style json parsers" which apparently don't exist, it feels like purism for its own sake, practicality taking a step down. Perhaps it suffices to change the returned content-type to multipart/related where each part is a valid JSON response? That said, I would hope it suffices to document the returned format for continuous mode with the expectation that it will be the preferred format for developers that write external indexers (which is the reason it was changed to the current format in the first place).
        Hide
        Joscha Feth added a comment -

        OK, look, just because you never saw a Yeti, this doesn't mean it does not exist
        I have a SAX-Style JSON parser at my hands here, which is way more efficient than any of the Bean-based parsers out there - but thats really not the point here.

        I think if the response says it is JSON, it should obey to the standard, whether it is harder to use or not. This is not purism, but simply a well-defined interface. Everything else is just a dirty hack which noone should rely on.
        A multipart-response might work, but might as well be an overkill for this, as I agree Benoit is right - most people will be fine using the line-by-line interface.
        Documenting the deviation from the JSON format is a good start I think, but whats the reason for not just having a flag which makes the ouptut valid JSON?

        Show
        Joscha Feth added a comment - OK, look, just because you never saw a Yeti, this doesn't mean it does not exist I have a SAX-Style JSON parser at my hands here, which is way more efficient than any of the Bean-based parsers out there - but thats really not the point here. I think if the response says it is JSON, it should obey to the standard, whether it is harder to use or not. This is not purism, but simply a well-defined interface. Everything else is just a dirty hack which noone should rely on. A multipart-response might work, but might as well be an overkill for this, as I agree Benoit is right - most people will be fine using the line-by-line interface. Documenting the deviation from the JSON format is a good start I think, but whats the reason for not just having a flag which makes the ouptut valid JSON?
        Hide
        Benoit Chesneau added a comment -

        What you describe is by the way the old behavior of _changes continuous feed. I agree that we fail on validation in timeout scenario (though it's easy to solve it on client side). The one problem with old behaviour I see with it is the extra time you have when you really want continuous changes ie :

        1) test if lines start with {'results
        2) test if line is {}
        3) remove the ","
        4) finally deserialize.

        Actually you just have to listen changes, get the line, handle it which is a way easier, faster. And you still can create a valid json parser at the end. We need to choose what is the best here, validation or fast handling.

        Show
        Benoit Chesneau added a comment - What you describe is by the way the old behavior of _changes continuous feed. I agree that we fail on validation in timeout scenario (though it's easy to solve it on client side). The one problem with old behaviour I see with it is the extra time you have when you really want continuous changes ie : 1) test if lines start with {'results 2) test if line is {} 3) remove the "," 4) finally deserialize. Actually you just have to listen changes, get the line, handle it which is a way easier, faster. And you still can create a valid json parser at the end. We need to choose what is the best here, validation or fast handling.
        Hide
        Joscha Feth added a comment -

        I know I am the "outsider" here, but I disagree - having a clean and valid interface is as much important as speed. +1 Vote for the flag (as this is both speedy and valid used with a SAX style parser).

        Show
        Joscha Feth added a comment - I know I am the "outsider" here, but I disagree - having a clean and valid interface is as much important as speed. +1 Vote for the flag (as this is both speedy and valid used with a SAX style parser).
        Hide
        Benoit Chesneau added a comment -

        I just propose a choice. Will see what others will say ...

        I still fail to see how a sax style parser and json could work together but that's another story. Anyway your response will never be valid if it timeout your parser will still need to complete the response. Maybe indeed sending a multipart response is more valid here.

        Show
        Benoit Chesneau added a comment - I just propose a choice. Will see what others will say ... I still fail to see how a sax style parser and json could work together but that's another story. Anyway your response will never be valid if it timeout your parser will still need to complete the response. Maybe indeed sending a multipart response is more valid here.
        Hide
        Joscha Feth added a comment -

        As for the SAX style parser:

        { x: "abc", y: "xyz" }

        would be:

        startDocument
        startElement x
        value abc
        endElement
        startElement y
        value xyz
        endElement
        endDocument

        If the client is closing the connection it will most likely close it, when all data needed is read --> valid state
        When the timeout occurs, the server (CouchDB) will send the closing ],"last_seq":XYZ}, just like it does now --> valid state

        Show
        Joscha Feth added a comment - As for the SAX style parser: { x: "abc", y: "xyz" } would be: startDocument startElement x value abc endElement startElement y value xyz endElement endDocument If the client is closing the connection it will most likely close it, when all data needed is read --> valid state When the timeout occurs, the server (CouchDB) will send the closing ],"last_seq":XYZ}, just like it does now --> valid state
        Hide
        Benoit Chesneau added a comment -

        I see though timeout could occur on client side -> no valid response also valid state != valid json in case you close the connection.

        Anyway I understand your position. Not sure what need to be done yet though

        Show
        Benoit Chesneau added a comment - I see though timeout could occur on client side -> no valid response also valid state != valid json in case you close the connection. Anyway I understand your position. Not sure what need to be done yet though
        Hide
        Paul Joseph Davis added a comment -

        I've met the Yeti. His name is Yajl. He's quite friendly, but no one ever really mentions the SAX thing much. Cause no one ever really produces partial JSON documents. Cause there really aren't that many SAX parsers. Cause as it turns out, SAX is a huge friggin pain in the butt. Granted this 'push parser' sounds more like a pull-dom parser. Either way, this discussion on commas is pretty weird for being about commas.

        The validity of that response with that content-type is debatable. AFAIK theres nothing in the specification that says you can't repeat values in a JSON stream. And there are parsers that allow for parsing multiple values. The XMPP 'infinite document' is just a clever hack around a format that explicitly denies repeated documents.

        If someone wants commas in that stream, then they should feel free to write a patch. But comma's have nothing to do with non-blocking i/o or other such things.

        Cheerio!

        Show
        Paul Joseph Davis added a comment - I've met the Yeti. His name is Yajl. He's quite friendly, but no one ever really mentions the SAX thing much. Cause no one ever really produces partial JSON documents. Cause there really aren't that many SAX parsers. Cause as it turns out, SAX is a huge friggin pain in the butt. Granted this 'push parser' sounds more like a pull-dom parser. Either way, this discussion on commas is pretty weird for being about commas. The validity of that response with that content-type is debatable. AFAIK theres nothing in the specification that says you can't repeat values in a JSON stream. And there are parsers that allow for parsing multiple values. The XMPP 'infinite document' is just a clever hack around a format that explicitly denies repeated documents. If someone wants commas in that stream, then they should feel free to write a patch. But comma's have nothing to do with non-blocking i/o or other such things. Cheerio!
        Hide
        Joscha Feth added a comment -

        Haha, I didn't even know Yajl existed Glad you met the Yeti!
        With the partial JSON documents: you are right, it's a rare case, but in general I would call such an endless changes feed a partial JSON document.
        You are right about the (non-)blocking - I just wanted to make sure that you understand why I am not eager to search for newlines in the feed - for me it's not a standard response with single JSON documents per line, but one giant stream of JSON. I don't care whether there are any white spaces between elements - may it be newlines or something else.

        Show
        Joscha Feth added a comment - Haha, I didn't even know Yajl existed Glad you met the Yeti! With the partial JSON documents: you are right, it's a rare case, but in general I would call such an endless changes feed a partial JSON document. You are right about the (non-)blocking - I just wanted to make sure that you understand why I am not eager to search for newlines in the feed - for me it's not a standard response with single JSON documents per line, but one giant stream of JSON. I don't care whether there are any white spaces between elements - may it be newlines or something else.
        Hide
        Chris Anderson added a comment -

        When we made the change away from commas I was a little concerned about validity. Now that you are bringing it up, I find myself thinking that a query option to provide valid JSON would be great.

        I think the general sentiment is that it would't hurt to give an option. The next step is for someone to write a patch for this. If the patch is clean I'd have no objections to including it.

        Show
        Chris Anderson added a comment - When we made the change away from commas I was a little concerned about validity. Now that you are bringing it up, I find myself thinking that a query option to provide valid JSON would be great. I think the general sentiment is that it would't hurt to give an option. The next step is for someone to write a patch for this. If the patch is clean I'd have no objections to including it.
        Hide
        Damien Katz added a comment -

        Wow, lots of comments on this one.

        I originally implemented this as a single JSON stream, it was switched to newline separated json objects for ease parsing by clients. I don't have an opinion one way or the other, but the thing starts to bothers me is the culture offering more options so that everyone can have it exactly as they want it.

        The stems from the increasing "texture" of the API, and what must get documented and tested, and the burden of what must get implemented for those who want to make compatible CouchDB implementations. I tend to favor simpler APIs to the point of occasionally pushing some of the complexity to the client to ensure the server itself isn't completely overloaded with complexity and options.

        Show
        Damien Katz added a comment - Wow, lots of comments on this one. I originally implemented this as a single JSON stream, it was switched to newline separated json objects for ease parsing by clients. I don't have an opinion one way or the other, but the thing starts to bothers me is the culture offering more options so that everyone can have it exactly as they want it. The stems from the increasing "texture" of the API, and what must get documented and tested, and the burden of what must get implemented for those who want to make compatible CouchDB implementations. I tend to favor simpler APIs to the point of occasionally pushing some of the complexity to the client to ensure the server itself isn't completely overloaded with complexity and options.
        Hide
        Joscha Feth added a comment -

        The more options, the more possible errors, yes - but making the client fixing the validity of the response still seems a little weird.

        A client which parses line-by-line can still strip the JSON header (which is always the first line and the same length btw.) and disregard any leading commas easily, because such a client interpretes each changes entry anyways - whereas a different parser working on the overall stream or a client which just wants all changes in the last X seconds and waits for the timeout has more problems making the response valid afterwards.

        But I agree it simplifies the process for some if it is readable line-by-line, so this might just be one of the cases where an option just is necessary (don't understand me wrong, I would also be fine if valid JSON output is the only default).

        Show
        Joscha Feth added a comment - The more options, the more possible errors, yes - but making the client fixing the validity of the response still seems a little weird. A client which parses line-by-line can still strip the JSON header (which is always the first line and the same length btw.) and disregard any leading commas easily, because such a client interpretes each changes entry anyways - whereas a different parser working on the overall stream or a client which just wants all changes in the last X seconds and waits for the timeout has more problems making the response valid afterwards. But I agree it simplifies the process for some if it is readable line-by-line, so this might just be one of the cases where an option just is necessary (don't understand me wrong, I would also be fine if valid JSON output is the only default).
        Hide
        Dirkjan Ochtman added a comment -

        For me, the non-valid-JSON (e.g. object-per-line without exceptions) is fine (and probably more useful than the alternative), EXCEPT that the Content-Type: application/json seems wrong for that case. I think "Content-Type: application/json" should only be provided on content that may easily (trivially) be parsed by a conforming JSON parser, which is definitely not true for the current feed=continuous feed.

        Show
        Dirkjan Ochtman added a comment - For me, the non-valid-JSON (e.g. object-per-line without exceptions) is fine (and probably more useful than the alternative), EXCEPT that the Content-Type: application/json seems wrong for that case. I think "Content-Type: application/json" should only be provided on content that may easily (trivially) be parsed by a conforming JSON parser, which is definitely not true for the current feed=continuous feed.
        Hide
        Randall Leeds added a comment -

        If someone else wants to close this as "Won't Fix", be my guest as that seems to be the consensus.
        I'm taking the uncontroversial stance of setting "Fix Version" as 2.0 so we can revisit this (if we want) and hopefully it doesn't remain open indefinitely.

        Show
        Randall Leeds added a comment - If someone else wants to close this as "Won't Fix", be my guest as that seems to be the consensus. I'm taking the uncontroversial stance of setting "Fix Version" as 2.0 so we can revisit this (if we want) and hopefully it doesn't remain open indefinitely.
        Hide
        Robert Newson added a comment - - edited

        First couchdb committer to reach retirement can take on this and the million niggling deviations from The Canonical Truth Of HTTP. Or knitting, at the retiree's option.

        Show
        Robert Newson added a comment - - edited First couchdb committer to reach retirement can take on this and the million niggling deviations from The Canonical Truth Of HTTP. Or knitting, at the retiree's option.

          People

          • Assignee:
            Unassigned
            Reporter:
            Joscha Feth
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development