CouchDB
  1. CouchDB
  2. COUCHDB-1367

update_seq does not always reflect the seq of the latest document update

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.1.1
    • Fix Version/s: None
    • Component/s: HTTP Interface
    • Labels:
    • Environment:

      Any

      Description

      Certain operations, (currently _revs_limit and _security changes) cause the database header's update_seq to increase when the by_seq index (and therefore _changes) has not changed, which is confusing in light of the naming consistency.

        Activity

        Hide
        Bob Dionne added a comment -

        Henrik,

        Thanks for the report. We discussed this a bit on irc this morning. So last_seq in the changes feed and update_seq in the db info are not intended to be the same, or at least there's some confusion about the semantics. couchdb-lucene uses continuous changes feeds so it doesn't have access to the last_seq value of a normal changes feed. When update_seq changes due to a call to set_revs_limit it gets out of whack.

        In any event the solution may be to simply add last_seq to the db_info record. It shouldn't be hard to fix and as you say this is an edge case. I'm curious are you setting the revs_limit a lot? If so what's the use case?

        Bob

        Show
        Bob Dionne added a comment - Henrik, Thanks for the report. We discussed this a bit on irc this morning. So last_seq in the changes feed and update_seq in the db info are not intended to be the same, or at least there's some confusion about the semantics. couchdb-lucene uses continuous changes feeds so it doesn't have access to the last_seq value of a normal changes feed. When update_seq changes due to a call to set_revs_limit it gets out of whack. In any event the solution may be to simply add last_seq to the db_info record. It shouldn't be hard to fix and as you say this is an edge case. I'm curious are you setting the revs_limit a lot? If so what's the use case? Bob
        Hide
        Henrik Hofmeister added a comment -

        No im not - i just made a script ensuring it gets set on all dbs - to save a request i did the "just set it to 1" approach - assuming it wouldn't matter if i set it to 1 several times. This is how i discovered there was such a problem. We had been getting it randomly before - without ever realizing what the problem was exactly.

        Its true that this specific case is for couchdb-lucene - however the general use case of being able to predict how far you're away from being up to date is not couchdb-lucene specific - i've for one created another in-house application that does exactly this - while performing a chained map-reduce like operation (What im saying is - if you want to reap the benefits of the changes feed and be aware of your progress - you'll need the right number)

        Show
        Henrik Hofmeister added a comment - No im not - i just made a script ensuring it gets set on all dbs - to save a request i did the "just set it to 1" approach - assuming it wouldn't matter if i set it to 1 several times. This is how i discovered there was such a problem. We had been getting it randomly before - without ever realizing what the problem was exactly. Its true that this specific case is for couchdb-lucene - however the general use case of being able to predict how far you're away from being up to date is not couchdb-lucene specific - i've for one created another in-house application that does exactly this - while performing a chained map-reduce like operation (What im saying is - if you want to reap the benefits of the changes feed and be aware of your progress - you'll need the right number)
        Hide
        Bob Dionne added a comment -

        sure, I think the confusion is this (from the WIKI):

        last_seq is the sequence number of the last update returned. (Currently it will always be the same as the seq of the last item in results.)

        It doesn't say that this is identical to update_seq. They are two different things, or at least became that way when other features such as set_revs_limit were added. It might be better for now in your app to use last_seq if you can

        Show
        Bob Dionne added a comment - sure, I think the confusion is this (from the WIKI): last_seq is the sequence number of the last update returned. (Currently it will always be the same as the seq of the last item in results.) It doesn't say that this is identical to update_seq. They are two different things, or at least became that way when other features such as set_revs_limit were added. It might be better for now in your app to use last_seq if you can
        Hide
        Paul Joseph Davis added a comment -

        Quite right. update_seq will be incremented for a number of API's that make changes while not updating documents. There's even an increment_update_seq API that just bumps it directly.

        Having a "last_update" in the db info blob that lists the last update for managing changes feeds might be useful in general and seems like it'd be an easy enough addition.

        Show
        Paul Joseph Davis added a comment - Quite right. update_seq will be incremented for a number of API's that make changes while not updating documents. There's even an increment_update_seq API that just bumps it directly. Having a "last_update" in the db info blob that lists the last update for managing changes feeds might be useful in general and seems like it'd be an easy enough addition.
        Hide
        Jason Smith added a comment -

        Hi, Bob. You said,

        > couchdb-lucene uses continuous changes feeds so it doesn't have access to the last_seq value of a normal changes feed.

        If the database is deleted during a continuous _changes request, CouchDB will send

        {"last_seq":N}

        and disconnect the client. Perhaps it's not relevant to this bug. But a robust changes listener should be prepared to get a last_seq message.

        I learned this from a bug report in Follow https://github.com/iriscouch/follow/issues/6

        Show
        Jason Smith added a comment - Hi, Bob. You said, > couchdb-lucene uses continuous changes feeds so it doesn't have access to the last_seq value of a normal changes feed. If the database is deleted during a continuous _changes request, CouchDB will send {"last_seq":N} and disconnect the client. Perhaps it's not relevant to this bug. But a robust changes listener should be prepared to get a last_seq message. I learned this from a bug report in Follow https://github.com/iriscouch/follow/issues/6
        Hide
        Robert Newson added a comment -

        Oh, ye of little faith. couchdb-lucene certainly can handle the last_seq output (and has since approximately forever). It's just so rarely encountered that it doesn't help with respect to the issue at hand.

        Show
        Robert Newson added a comment - Oh, ye of little faith. couchdb-lucene certainly can handle the last_seq output (and has since approximately forever). It's just so rarely encountered that it doesn't help with respect to the issue at hand.
        Hide
        Bob Dionne added a comment -

        I'm a little confused then. It seems if c-l has access to the last_seq then it should be able to determine if there are no changes of interest. I thought the commentary on IRC was that this was not the case because it uses the continuous feed.

        Adding last_seq to the db info request will certainly solve it, though look a little redundant to the user as it will be almost always be equal to update_seq

        Show
        Bob Dionne added a comment - I'm a little confused then. It seems if c-l has access to the last_seq then it should be able to determine if there are no changes of interest. I thought the commentary on IRC was that this was not the case because it uses the continuous feed. Adding last_seq to the db info request will certainly solve it, though look a little redundant to the user as it will be almost always be equal to update_seq
        Hide
        Henrik Hofmeister added a comment -

        What i'm puzzled about - is what would i ever need the update_seq for ? It allows me to - see that there has been made a change - however in the changes view it shows me that there are no changes? Only in the cases where it differs for last_seq of course - but what could i ever possibly use that number for? That is - a number - signalling that i have either updated revs_limit or a random other number of internal api calls ? Its absolutly useless - especially while i have no way of getting to know whats changed.

        update_seq would - in any possible case - be expected by the user to reflect your core feature - the changes feed?

        Not making it into a huge problem - but the only real fix for a production env. product like couchdb is to not add to the confusion - but fix the confusion (like not adding another number to the db info page) . That would give you 2 numbers - one that is useless (update_seq) and one that is the one you'd expect (last_seq). ?

        Show
        Henrik Hofmeister added a comment - What i'm puzzled about - is what would i ever need the update_seq for ? It allows me to - see that there has been made a change - however in the changes view it shows me that there are no changes? Only in the cases where it differs for last_seq of course - but what could i ever possibly use that number for? That is - a number - signalling that i have either updated revs_limit or a random other number of internal api calls ? Its absolutly useless - especially while i have no way of getting to know whats changed. update_seq would - in any possible case - be expected by the user to reflect your core feature - the changes feed? Not making it into a huge problem - but the only real fix for a production env. product like couchdb is to not add to the confusion - but fix the confusion (like not adding another number to the db info page) . That would give you 2 numbers - one that is useless (update_seq) and one that is the one you'd expect (last_seq). ?
        Hide
        Dave Cottlehuber added a comment -

        +1 on "last_update" being consistent across replication, db, changes. I can't see anywhere else this is exposed via API. Now where to document that?

        Show
        Dave Cottlehuber added a comment - +1 on "last_update" being consistent across replication, db, changes. I can't see anywhere else this is exposed via API. Now where to document that?
        Hide
        Randall Leeds added a comment -

        I'm confused about why a client listening to continuous _changes should care whether or not update_seq changes without emitting a document modification. I like to think that in the future other sorts of changes might be allowed to surface in that feed. Is there any place we guarantee that seq in the changes feed should be monotonic? I don't think so. It seems to me like this is not a problem.

        Show
        Randall Leeds added a comment - I'm confused about why a client listening to continuous _changes should care whether or not update_seq changes without emitting a document modification. I like to think that in the future other sorts of changes might be allowed to surface in that feed. Is there any place we guarantee that seq in the changes feed should be monotonic? I don't think so. It seems to me like this is not a problem.
        Hide
        Robert Newson added a comment -

        I would hope it's obvious that update_seq must be monotonously incrementing (i.e, it cannot go down).

        You're right in the strict sense here, readers of the changes feed will get a row for each document change, and nothing else. The subtle point I think you've missed in that some applications want to know if they've read all changes up to the current update sequence of the database. non stale=ok view queries already do this, and couchdb-lucene does too. It turns out that only the view engine can do it correctly in all cases because it knows the last sequence value that affected a document (that is, it doesn't 'see' the change to _security, and thus doesn't block for that change).

        Show
        Robert Newson added a comment - I would hope it's obvious that update_seq must be monotonously incrementing (i.e, it cannot go down). You're right in the strict sense here, readers of the changes feed will get a row for each document change, and nothing else. The subtle point I think you've missed in that some applications want to know if they've read all changes up to the current update sequence of the database. non stale=ok view queries already do this, and couchdb-lucene does too. It turns out that only the view engine can do it correctly in all cases because it knows the last sequence value that affected a document (that is, it doesn't 'see' the change to _security, and thus doesn't block for that change).
        Hide
        Randall Leeds added a comment -

        Oh, I messed up. Monotonic does not mean what I think it means. I meant it need not increase by intervals of 1 all the time.

        Show
        Randall Leeds added a comment - Oh, I messed up. Monotonic does not mean what I think it means. I meant it need not increase by intervals of 1 all the time.
        Hide
        Randall Leeds added a comment -

        I'd propose making the heartbeat option send the last_seq row, but that'd probably break clients that expect a disconnect at that point.

        Show
        Randall Leeds added a comment - I'd propose making the heartbeat option send the last_seq row, but that'd probably break clients that expect a disconnect at that point.
        Hide
        Robert Newson added a comment -

        Correct, 'monotonically incrementing' means that the number always goes up but does not imply that it always goes up by exactly 1. Because it sorta kind sounds like that, I clarified with 'never goes down' and tried a synonym for monotonic. The changes feed will certainly have gaps, but no row N will have a lower update_seq than any previously seen row.

        monotonic |ˌmänəˈtänik|
        adjective
        1 Mathematics (of a function or quantity) varying in such a way that it either never decreases or never increases.

        Show
        Robert Newson added a comment - Correct, 'monotonically incrementing' means that the number always goes up but does not imply that it always goes up by exactly 1. Because it sorta kind sounds like that, I clarified with 'never goes down' and tried a synonym for monotonic. The changes feed will certainly have gaps, but no row N will have a lower update_seq than any previously seen row. monotonic |ˌmänəˈtänik| adjective 1 Mathematics (of a function or quantity) varying in such a way that it either never decreases or never increases.
        Hide
        Randall Leeds added a comment -

        My gut says it's more confusing to put update_seq and last_seq in the db info. If clients want to be sure they're up to date, they need only pass a heartbeat parameter to the changes feed. If the heartbeats are coming in but no updates are, the feed is up to date.

        Show
        Randall Leeds added a comment - My gut says it's more confusing to put update_seq and last_seq in the db info. If clients want to be sure they're up to date, they need only pass a heartbeat parameter to the changes feed. If the heartbeats are coming in but no updates are, the feed is up to date.
        Hide
        Bob Dionne added a comment -

        I agree, it would appear redundant to the user and confusing as it would most likely always be the same. I like the heartbeat solution, especially because it means we don't need to fix anything in couchdb

        Show
        Bob Dionne added a comment - I agree, it would appear redundant to the user and confusing as it would most likely always be the same. I like the heartbeat solution, especially because it means we don't need to fix anything in couchdb
        Hide
        Randall Leeds added a comment -

        I suggested the heartbeat because we could make it look like last_seq, but we don't even need to use heartbeat. We could emit a change that doesn't have a corresponding id and revisions. Although, since the URL is /db/_revs_limit, we could (to use that example) emit something like:

        {"seq":X,"id":"_revs_limit","changes":[]}

        I have no idea how badly that confuse existing clients, including CouchDB. Putting it on the db info is the least obtrusive from an API standpoint. From a code internals, I think everything would require some change (with the exception of doing nothing about this). I'm going to step away for now, but if you need any more color swatches I can send over some more samples.

        Show
        Randall Leeds added a comment - I suggested the heartbeat because we could make it look like last_seq, but we don't even need to use heartbeat. We could emit a change that doesn't have a corresponding id and revisions. Although, since the URL is /db/_revs_limit, we could (to use that example) emit something like: {"seq":X,"id":"_revs_limit","changes":[]} I have no idea how badly that confuse existing clients, including CouchDB. Putting it on the db info is the least obtrusive from an API standpoint. From a code internals, I think everything would require some change (with the exception of doing nothing about this). I'm going to step away for now, but if you need any more color swatches I can send over some more samples.
        Hide
        Jason Smith added a comment -

        I disagree with Randal's gut. The metadata from a /db response is not confusing anybody. Most people poke around for values they care about and ignore the rest. That "committed_update_seq" doesn't seem to bother anybody hopefully demonstrates this point.

        On the other hand, a new type of change object has more impact. For example, if I add ?include_docs=true, what would the .doc field be, if anything?

        So, in other words, IMO "last_seq" in the DB response is perfect. People have seen it in their _changes queries and they will make the right assumptions about its meanings.

        Show
        Jason Smith added a comment - I disagree with Randal's gut. The metadata from a /db response is not confusing anybody. Most people poke around for values they care about and ignore the rest. That "committed_update_seq" doesn't seem to bother anybody hopefully demonstrates this point. On the other hand, a new type of change object has more impact. For example, if I add ?include_docs=true, what would the .doc field be, if anything? So, in other words, IMO "last_seq" in the DB response is perfect. People have seen it in their _changes queries and they will make the right assumptions about its meanings.
        Hide
        Randall Leeds added a comment -

        For revs_limit we could always surface a change to the document with id "_revs_limit", which is _somewhat accurate (though it's not a full doc with revisions). Similar thoughts apply for _security.

        If there's no technical reason why we need to bump the seq for changes that don't modify a document we could just stop doing that. What operations do this currently? How many of these resources have we at one time or another discussed making into a full document?

        Show
        Randall Leeds added a comment - For revs_limit we could always surface a change to the document with id "_revs_limit", which is _somewhat accurate (though it's not a full doc with revisions). Similar thoughts apply for _security. If there's no technical reason why we need to bump the seq for changes that don't modify a document we could just stop doing that. What operations do this currently? How many of these resources have we at one time or another discussed making into a full document?
        Hide
        Jason Smith added a comment -

        Special-case changes response remind me of the discussion local docs (I think WRT _security). It seems worth considering making couch have fewer special cases as it gains features.

        Could the revs limit be set in a _local/* document, which would have standard MVCC semantics (but they don't replicate)? Clients can examine and configure databases with their normal document manipulation functions, communicating with Couch through documents.

        The list of things that arguably belong in _local/ grows. The security object, and apparently now the revs limit value can still be stored in the file header, but that is only a cache. (Couch might even expose the legacy API and internally convert it to document updates.)

        Is it possible?

        Show
        Jason Smith added a comment - Special-case changes response remind me of the discussion local docs (I think WRT _security). It seems worth considering making couch have fewer special cases as it gains features. Could the revs limit be set in a _local/* document, which would have standard MVCC semantics (but they don't replicate)? Clients can examine and configure databases with their normal document manipulation functions, communicating with Couch through documents. The list of things that arguably belong in _local/ grows. The security object, and apparently now the revs limit value can still be stored in the file header, but that is only a cache. (Couch might even expose the legacy API and internally convert it to document updates.) Is it possible?
        Hide
        Robert Newson added a comment -

        On reflection, it's couchdb-lucene's bug, not couchdb's. Let me explain.

        CouchDB-Lucene (to give it its grown-up name) compares the update_seq from a GET /dbname to the sequences a background process is indexing through. It then unblocks searcher threads as that process reaches or exceeds the required update_seq. This is, in fact, just silly.

        Instead, a search query should cause a GET /dbname/_changes?since=<latest index checkpoint>. It should block until it consumes the entire response, passing the updates to the indexing process. It can then return a non-stale search result. In the case that the index is fresh, the _changes response contains no rows, and serves only to confirm that the index is fresh. If, as planned, CouchDB-Lucene also runs a _changes?feed=continuous to keep indexes fresh in the background then indexes will simply be fresher than they would be in the CouchDB case.

        I repeat, CouchDB-Lucene's mistake is to only use the feed=continuous variety of the changes feed. This prevents it from knowing when its own index is fresh.

        I will make this change next week and I suggest that this ticket be closed with no further action taken.

        Show
        Robert Newson added a comment - On reflection, it's couchdb-lucene's bug, not couchdb's. Let me explain. CouchDB-Lucene (to give it its grown-up name) compares the update_seq from a GET /dbname to the sequences a background process is indexing through. It then unblocks searcher threads as that process reaches or exceeds the required update_seq. This is, in fact, just silly. Instead, a search query should cause a GET /dbname/_changes?since=<latest index checkpoint>. It should block until it consumes the entire response, passing the updates to the indexing process. It can then return a non-stale search result. In the case that the index is fresh, the _changes response contains no rows, and serves only to confirm that the index is fresh. If, as planned, CouchDB-Lucene also runs a _changes?feed=continuous to keep indexes fresh in the background then indexes will simply be fresher than they would be in the CouchDB case. I repeat, CouchDB-Lucene's mistake is to only use the feed=continuous variety of the changes feed. This prevents it from knowing when its own index is fresh. I will make this change next week and I suggest that this ticket be closed with no further action taken.
        Hide
        Randall Leeds added a comment -

        Closing as "Not A Problem" as per Robert Newson's last comment. If this is incorrect, please re-open with a compelling argument to change this behavior.

        Show
        Randall Leeds added a comment - Closing as "Not A Problem" as per Robert Newson's last comment. If this is incorrect, please re-open with a compelling argument to change this behavior.
        Hide
        Bob Dionne added a comment -

        wonderful, I wondering about this as this is precisely how I keep up to date in my native indexer, using _changes?since and storing the last_seq in a checkpoint.

        Show
        Bob Dionne added a comment - wonderful, I wondering about this as this is precisely how I keep up to date in my native indexer, using _changes?since and storing the last_seq in a checkpoint.
        Hide
        Robert Newson added a comment -

        The motivation for c-l's current approach is for indexes to be fresh at all times as well as updating the lucene indexes optimally (in modest sized batches). If it were purely driven by user queries, it would be variably stale, and variably well optimized, like couchdb's indexes. I wanted better for my child.

        Show
        Robert Newson added a comment - The motivation for c-l's current approach is for indexes to be fresh at all times as well as updating the lucene indexes optimally (in modest sized batches). If it were purely driven by user queries, it would be variably stale, and variably well optimized, like couchdb's indexes. I wanted better for my child.
        Hide
        Paul Joseph Davis added a comment -

        @Randall

        >If there's no technical reason why we need to bump the seq for changes that don't modify a document we could just stop doing that. What operations do this currently? How many of these resources have we at one time or another discussed making into a full document?

        I think _purge, _revs_limit, and _security. Maybe other things. Not sure about the strictly confirming to theoretical model for _revs_limit and _security, but _purge has obvious semantics that need to be updated in views. Though as I think about this, I'm fairly certain we have a bug in the indexer that no one has ever reported. Granted, that seems to be fairly true of things that touch _purge code.

        Show
        Paul Joseph Davis added a comment - @Randall >If there's no technical reason why we need to bump the seq for changes that don't modify a document we could just stop doing that. What operations do this currently? How many of these resources have we at one time or another discussed making into a full document? I think _purge, _revs_limit, and _security. Maybe other things. Not sure about the strictly confirming to theoretical model for _revs_limit and _security, but _purge has obvious semantics that need to be updated in views. Though as I think about this, I'm fairly certain we have a bug in the indexer that no one has ever reported. Granted, that seems to be fairly true of things that touch _purge code.
        Hide
        Jason Smith added a comment -

        Wait a second. Robert, you are not fixing a bug in C-L, you are working around a deficiency in CouchDB.

        The only way to know the latest sequence id is to make a complete _changes query. Next, follow that up with a continuous feed if you want to keep the state fresh.

        That is a paper cut.

        What if I want to see the most recent five changes? What if there are a hundred million documents? What if 99% of the time, update_seq equals last_seq and so developers assume it means something it doesn't?

        Everybody wants to know the id of the latest change. Nobody wants to know the "update sequence," whatever that is. If CouchDB can cache last_seq in the header and provide it in the DB response, that would be fantastic. Kindly reopen this ticket, then. Thanks!

        Show
        Jason Smith added a comment - Wait a second. Robert, you are not fixing a bug in C-L, you are working around a deficiency in CouchDB. The only way to know the latest sequence id is to make a complete _changes query. Next, follow that up with a continuous feed if you want to keep the state fresh. That is a paper cut. What if I want to see the most recent five changes? What if there are a hundred million documents? What if 99% of the time, update_seq equals last_seq and so developers assume it means something it doesn't? Everybody wants to know the id of the latest change. Nobody wants to know the "update sequence," whatever that is. If CouchDB can cache last_seq in the header and provide it in the DB response, that would be fantastic. Kindly reopen this ticket, then. Thanks!
        Hide
        Randall Leeds added a comment -

        > Wait a second. Robert, you are not fixing a bug in C-L, you are working around a deficiency in CouchDB.

        Can't both be true?

        > The only way to know the latest sequence id is to make a complete _changes query. Next, follow that up with a continuous feed if you want to keep the state fresh.

        Nope. You can not ever know. You always know the latest sequence number at some arbitrarily recent point in time. If the (possibly continuous) changes feed says you're at X and you haven't heard anything more yet, then the database is at update sequence >= X. Neveretheless, I think I follow the sentiment. If last_seq were there one could know from an info request whether or not more changes should be available.

        > What if I want to see the most recent five changes? What if there are a hundred million documents? What if 99% of the time, update_seq equals last_seq and so developers assume it means something it doesn't?

        In order:

        • /_changes?descending=true&limit=5
        • Not sure how this is relevant
        • This does indeed seem to cause some confusion. It clearly surprised Robert and Henrik and it's the first I've heard of this discrepancy.

        I submit that this bug is closed accurately with a suggestion to move proposals to improve the situation over to the dev list. Off the top of my head a partial list of suggestions goes something like:

        • Add additional information to the changes feed, perhaps with a query parameter (almost the reverse of include docs)
        • Stop incrementing the update sequence on certain kinds of non-document changes
        • Add more information to the db information response

        Please take it from there and we can work through a proposal. Thanks, everyone.

        Show
        Randall Leeds added a comment - > Wait a second. Robert, you are not fixing a bug in C-L, you are working around a deficiency in CouchDB. Can't both be true? > The only way to know the latest sequence id is to make a complete _changes query. Next, follow that up with a continuous feed if you want to keep the state fresh. Nope. You can not ever know. You always know the latest sequence number at some arbitrarily recent point in time. If the (possibly continuous) changes feed says you're at X and you haven't heard anything more yet, then the database is at update sequence >= X. Neveretheless, I think I follow the sentiment. If last_seq were there one could know from an info request whether or not more changes should be available. > What if I want to see the most recent five changes? What if there are a hundred million documents? What if 99% of the time, update_seq equals last_seq and so developers assume it means something it doesn't? In order: /_changes?descending=true&limit=5 Not sure how this is relevant This does indeed seem to cause some confusion. It clearly surprised Robert and Henrik and it's the first I've heard of this discrepancy. I submit that this bug is closed accurately with a suggestion to move proposals to improve the situation over to the dev list. Off the top of my head a partial list of suggestions goes something like: Add additional information to the changes feed, perhaps with a query parameter (almost the reverse of include docs) Stop incrementing the update sequence on certain kinds of non-document changes Add more information to the db information response Please take it from there and we can work through a proposal. Thanks, everyone.
        Hide
        Henrik Hofmeister added a comment -

        Although its hard not to agree with Robert on c-l - i still can't really see a use-case for the update_seq value - specifically why one would ever want to know that revs_limit etc. has been updated - without being able to know what has been updated. Just "Something has been done - 5 times" ... ?

        I'd say for the sake of expected output - c-l aside - this is still a bug / "hidden feature" - and should at the very least be documented ?

        Show
        Henrik Hofmeister added a comment - Although its hard not to agree with Robert on c-l - i still can't really see a use-case for the update_seq value - specifically why one would ever want to know that revs_limit etc. has been updated - without being able to know what has been updated. Just "Something has been done - 5 times" ... ? I'd say for the sake of expected output - c-l aside - this is still a bug / "hidden feature" - and should at the very least be documented ?
        Hide
        Randall Leeds added a comment -

        Re-opening in light of recent dev@ discussion. It appears action will be taken here. Thanks, Henrik.

        Show
        Randall Leeds added a comment - Re-opening in light of recent dev@ discussion. It appears action will be taken here. Thanks, Henrik.
        Hide
        Randall Leeds added a comment -

        Updated the description and title to reflect the problem in general.

        Proposals so far:
        1. Add a new header field
        a. to track the highest value in the by_seq index
        b. to track header updates that do not affect by_seq, causing update_seq to behave in a manner more consistent with expectation
        2. Migrate the non-replicable metadata into the document API and hang it within the by_seq index

        As far as I can tell I'm the only proponent of (2). Proposal (2) is broader in scope, more difficult to implement, and fails to account for the possibility that other, current or future, database header updates may not fit into the document model. Therefore, I'll formally retract my suggestion that it be pursued as a solution to the present ticket.

        Resuming discussion back here (sorry if it was unnecessary or confusing that I migrated it to dev@), how does the community feel about (1a) vs (1b)? I'm in favor of 1b, myself.

        Show
        Randall Leeds added a comment - Updated the description and title to reflect the problem in general. Proposals so far: 1. Add a new header field a. to track the highest value in the by_seq index b. to track header updates that do not affect by_seq, causing update_seq to behave in a manner more consistent with expectation 2. Migrate the non-replicable metadata into the document API and hang it within the by_seq index As far as I can tell I'm the only proponent of (2). Proposal (2) is broader in scope, more difficult to implement, and fails to account for the possibility that other, current or future, database header updates may not fit into the document model. Therefore, I'll formally retract my suggestion that it be pursued as a solution to the present ticket. Resuming discussion back here (sorry if it was unnecessary or confusing that I migrated it to dev@), how does the community feel about (1a) vs (1b)? I'm in favor of 1b, myself.
        Hide
        Robert Newson added a comment -

        I still don't see why we need do anything. My early mistaken understanding of this value should not be used as motivation here. At the time, I was not a "couchdb committer" so the earlier implication that it escaped even my awesome knowledge is to impute omniscience where none is warranted.

        I haven't yet fixed couchdb-lucene as the update model is rather clumsy. Simply calling _changes?since=N instead of comparing update_seq with a local value will radically simplify that piece of couchdb-lucene. It should improve the internals of couchdb-lucene so significantly that I would rather not fix update_seq to work the way I expected years ago, in case it misleads someone into making the same mistake I made.

        That said, I wouldn't veto the change, but C-L will not depend on either the current or any future meaning of update_seq in the next release.

        Show
        Robert Newson added a comment - I still don't see why we need do anything. My early mistaken understanding of this value should not be used as motivation here. At the time, I was not a "couchdb committer" so the earlier implication that it escaped even my awesome knowledge is to impute omniscience where none is warranted. I haven't yet fixed couchdb-lucene as the update model is rather clumsy. Simply calling _changes?since=N instead of comparing update_seq with a local value will radically simplify that piece of couchdb-lucene. It should improve the internals of couchdb-lucene so significantly that I would rather not fix update_seq to work the way I expected years ago, in case it misleads someone into making the same mistake I made. That said, I wouldn't veto the change, but C-L will not depend on either the current or any future meaning of update_seq in the next release.
        Hide
        Henrik Hofmeister added a comment -

        @Robert - C-L aside - its still strange that its just some 'random' number of actions taken which is not really usable to the end user. That C-L could use a different implementation is besides the point imo - Its still is usable to get the current latest update seq - for whatever reason - if even to just show a progress bar in a to-be-developed feature in some gui application or whatever.

        We use it internally to track progress of an aggragation application.

        @Randall: in terms of adding a field or whatever - my only input is - less is more - update_seq makes perfect sense naming wise, it should just be the expected value. If you'll need to track other changes than doc changes later on in terms of replication or whatever - my uneducated guess is you're gonna need to do alot more than just have a number increase in any case. But thats just me http://c2.com/xp/YouArentGonnaNeedIt.html

        Thanks

        Show
        Henrik Hofmeister added a comment - @Robert - C-L aside - its still strange that its just some 'random' number of actions taken which is not really usable to the end user. That C-L could use a different implementation is besides the point imo - Its still is usable to get the current latest update seq - for whatever reason - if even to just show a progress bar in a to-be-developed feature in some gui application or whatever. We use it internally to track progress of an aggragation application. @Randall: in terms of adding a field or whatever - my only input is - less is more - update_seq makes perfect sense naming wise, it should just be the expected value. If you'll need to track other changes than doc changes later on in terms of replication or whatever - my uneducated guess is you're gonna need to do alot more than just have a number increase in any case. But thats just me http://c2.com/xp/YouArentGonnaNeedIt.html Thanks
        Hide
        Robert Newson added a comment -

        resolved in error

        Show
        Robert Newson added a comment - resolved in error
        Hide
        Robert Newson added a comment -

        FYI: I've fixed couchdb-lucene. Instead of using "update_seq" from GET /dbname I instead grab "last_seq" from GET /dbname/_changes?limit=0&descending=true.

        Show
        Robert Newson added a comment - FYI: I've fixed couchdb-lucene. Instead of using "update_seq" from GET /dbname I instead grab "last_seq" from GET /dbname/_changes?limit=0&descending=true.

          People

          • Assignee:
            Unassigned
            Reporter:
            Henrik Hofmeister
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development