Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-430 Consistent Operations
  3. KUDU-1703

Handle snapshot reads that might block indefinitely

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      When we fix safe time advancement, replicas will start to block on snapshot scans until a timeout occurs, waiting to have a consistent view of the world at that timestamp before serving the scan. This will be a serious problem for lagging replicas, which might be several seconds or even minutes behind.

      Moreover in the absence of writes, the same will happen even for non-lagging replicas, which will have their safe times updated only when the leader heartbeats.

      We need to at least make sure that:

      • Blocked scanner threads are not starving other work.
      • If the replica's safe time is lagging by a lot, the replica refuses to do the scan and the client retries it on another replica.

      We might also consider other optimizations (like pinging the leader for up-to-date replicas that are just waiting on a heartbeat).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dralves David Alves
                Reporter:
                dralves David Alves
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: