Details

    • Type: Wish Wish
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Read-only mode
      Possible Mentor
      Henry Robinson (henry at apache dot org)

      Requirements
      Java and TCP/IP networking

      Description
      When a ZooKeeper server loses contact with over half of the other servers in an ensemble ('loses a quorum'), it stops responding to client requests because it cannot guarantee that writes will get processed correctly. For some applications, it would be beneficial if a server still responded to read requests when the quorum is lost, but caused an error condition when a write request was attempted.

      This project would implement a 'read-only' mode for ZooKeeper servers (maybe only for Observers) that allowed read requests to be served as long as the client can contact a server.

      This is a great project for getting really hands-on with the internals of ZooKeeper - you must be comfortable with Java and networking otherwise you'll have a hard time coming up to speed.

        Issue Links

          Activity

          Hide
          Dave Wright added a comment -

          This is a great idea, but I'm afraid there is a somewhat fundamental problem with this concept.
          What you want is if enough nodes "go down" that a quorum can't be formed (at all), the remaining nodes go into read-only mode.

          The problem is that if a partition occurs (say, a single server loses contact with the rest of the cluster), but a quorum still exists, we want clients who were connected to the partitioned server to re-connect to a server in the majority. The current design allows for this by denying connections to minority nodes, forcing clients to hunt for the majority. If we allow servers in the minority to keep/accept connections, then clients will end up in read-only mode when they could have simply reconnected to the majority.

          It may be possible to accomplish the desired outcome with some client-side and connection protocol changes. Specifically, a flag on the connection request from the client that says "allow read-only connections" - if false, the server will close the connection, allowing the client to hunt for a server in the majority. Once a client has gone through all the servers in the list (and found out that none are in the majority) it could flip the flag to true and connect to any running servers in read-only mode. There is still the question of how to get back out of read only mode (e.g. should we keep hunting in the background for a majority, or just wait until the server we are connected to re-forms a quorum).

          Show
          Dave Wright added a comment - This is a great idea, but I'm afraid there is a somewhat fundamental problem with this concept. What you want is if enough nodes "go down" that a quorum can't be formed (at all), the remaining nodes go into read-only mode. The problem is that if a partition occurs (say, a single server loses contact with the rest of the cluster), but a quorum still exists, we want clients who were connected to the partitioned server to re-connect to a server in the majority. The current design allows for this by denying connections to minority nodes, forcing clients to hunt for the majority. If we allow servers in the minority to keep/accept connections, then clients will end up in read-only mode when they could have simply reconnected to the majority. It may be possible to accomplish the desired outcome with some client-side and connection protocol changes. Specifically, a flag on the connection request from the client that says "allow read-only connections" - if false, the server will close the connection, allowing the client to hunt for a server in the majority. Once a client has gone through all the servers in the list (and found out that none are in the majority) it could flip the flag to true and connect to any running servers in read-only mode. There is still the question of how to get back out of read only mode (e.g. should we keep hunting in the background for a majority, or just wait until the server we are connected to re-forms a quorum).
          Hide
          Sergey Doroshenko added a comment -

          Dave, thanks for feedback,

          Did you check http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode ?

          Approach described there is similar to what you've proposed: make server distinguish read-only and usual clients.
          However, I was thinking that r-o client should go to read-only mode right after server it's tied to is partitioned, without trying to reconnect to majority. But your idea that client should try all servers first is definitely a better option.

          Also I think current behavior of ZooKeeper client should remain unchanged.
          I mean, there should be either new class for r-o client, or new functionality in current client which is explicitly triggered say by a flag passed to ctor. The idea is not to break code for current users.

          Show
          Sergey Doroshenko added a comment - Dave, thanks for feedback, Did you check http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode ? Approach described there is similar to what you've proposed: make server distinguish read-only and usual clients. However, I was thinking that r-o client should go to read-only mode right after server it's tied to is partitioned, without trying to reconnect to majority. But your idea that client should try all servers first is definitely a better option. Also I think current behavior of ZooKeeper client should remain unchanged. I mean, there should be either new class for r-o client, or new functionality in current client which is explicitly triggered say by a flag passed to ctor. The idea is not to break code for current users.
          Hide
          Sergey Doroshenko added a comment -

          I have updated wiki page to describe new (quite simple and elegant) approach of implementing server-side part of the read-only mode.
          Already discussed this with Henry yesterday.

          Take a look if you're interested in the details, and lmk if you have some thoughts about this.

          Show
          Sergey Doroshenko added a comment - I have updated wiki page to describe new (quite simple and elegant) approach of implementing server-side part of the read-only mode. Already discussed this with Henry yesterday. Take a look if you're interested in the details, and lmk if you have some thoughts about this.
          Hide
          Ivan Kelly added a comment -

          Whats the status with this JIRA? Has it been put on a backburner or is someone planning to come back to it fairly soon?

          Show
          Ivan Kelly added a comment - Whats the status with this JIRA? Has it been put on a backburner or is someone planning to come back to it fairly soon?
          Hide
          Sergey Doroshenko added a comment -

          I'm on it. I'll attach updated patches for child tickets soon – probably next week.

          Show
          Sergey Doroshenko added a comment - I'm on it. I'll attach updated patches for child tickets soon – probably next week.
          Hide
          Ivan Kelly added a comment -

          Cool. I came across a usecase for this yesterday and was going to pick it up if it wasn't being worked on.

          Looks like you have it in hand though .

          Show
          Ivan Kelly added a comment - Cool. I came across a usecase for this yesterday and was going to pick it up if it wasn't being worked on. Looks like you have it in hand though .

            People

            • Assignee:
              Sergey Doroshenko
              Reporter:
              Henry Robinson
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development