CouchDB
  1. CouchDB
  2. COUCHDB-956

Return all _seq values as strings not integers

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Some fields are returned as strings in db_info and other places to protect against numeric overflow;

      {"db_name":"db","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"instance_start_time":"1290088043619158","disk_format_version":5,"committed_update_seq":0}

      here, instance_start_time is protected but, more critically, update_seq is not.

      If update_seq were to be wrapped due to precision issues, what breaks?

        Activity

        Hide
        Adam Kocoloski added a comment -

        The replicator sorts sequence values from the rows it has processed in the _changes feed to ensure that it always checkpoints at the highest one possible. Sorting them as strings results in the replicator skipping checkpoints after it reaches a sequence with a number of leading 9s.

        Show
        Adam Kocoloski added a comment - The replicator sorts sequence values from the rows it has processed in the _changes feed to ensure that it always checkpoints at the highest one possible. Sorting them as strings results in the replicator skipping checkpoints after it reaches a sequence with a number of leading 9s.
        Hide
        Randall Leeds added a comment -

        How do we feel about just keeping the order in which they arrive and not trying to sort them.
        I feel like the responsibility should be on the server to provide a _changes feed in the correct order.

        Show
        Randall Leeds added a comment - How do we feel about just keeping the order in which they arrive and not trying to sort them. I feel like the responsibility should be on the server to provide a _changes feed in the correct order.
        Hide
        Adam Kocoloski added a comment -

        Yes, it would be a good idea not to sort them. If I recall correctly, the reason for the sorting is that the reader will rearrange documents through its use of parallel connections. But really, we can leave them unsorted. Worst that will happen is that a few documents get checked again on the next replication.

        Show
        Adam Kocoloski added a comment - Yes, it would be a good idea not to sort them. If I recall correctly, the reason for the sorting is that the reader will rearrange documents through its use of parallel connections. But really, we can leave them unsorted. Worst that will happen is that a few documents get checked again on the next replication.
        Hide
        Randall Leeds added a comment -

        I think it's actually important that the high watermark is based on position in the input sequence or the worst that happens is documents are missed.

        Last I looked at it the best solution that came to mind was to actually assign a local ephemeral sequence within this replication session to changes as they arrived for the purposes of tracking the high watermark during parallel fetching.

        Show
        Randall Leeds added a comment - I think it's actually important that the high watermark is based on position in the input sequence or the worst that happens is documents are missed. Last I looked at it the best solution that came to mind was to actually assign a local ephemeral sequence within this replication session to changes as they arrived for the purposes of tracking the high watermark during parallel fetching.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Newson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development