Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12943

Consistent Reads from Standby Node

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.10.0, 3.3.0, 3.1.4, 3.2.2
    • hdfs
    • None
    • Reviewed
    • Hide
      Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests.

      To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode.

      Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent.

      A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests.
      Show
      Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests.

    Description

      StandbyNode in HDFS is a replica of the active NameNode. The states of the NameNodes are coordinated via the journal. It is natural to consider StandbyNode as a read-only replica. As with any replicated distributed system the problem of stale reads should be resolved. Our main goal is to provide reads from standby in a consistent way in order to enable a wide range of existing applications running on top of HDFS.

      Attachments

        1. ConsistentReadsFromStandbyNode.pdf
          396 kB
          Konstantin Shvachko
        2. ConsistentReadsFromStandbyNode.pdf
          394 kB
          Konstantin Shvachko
        3. HDFS-12943-001.patch
          328 kB
          Konstantin Shvachko
        4. HDFS-12943-002.patch
          354 kB
          Konstantin Shvachko
        5. HDFS-12943-003.patch
          353 kB
          Konstantin Shvachko
        6. HDFS-12943-004.patch
          353 kB
          Konstantin Shvachko
        7. TestPlan-ConsistentReadsFromStandbyNode.pdf
          79 kB
          Konstantin Shvachko

        Issue Links

        1.
        Tailing edits should not update quota counts on ObserverNode Sub-task Resolved Erik Krogen   Actions
        2.
        Changes to the NameNode to support reads from standby Sub-task Resolved Chao Sun   Actions
        3.
        Introduce ObserverReadProxyProvider Sub-task Resolved Chao Sun   Actions
        4.
        [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC Sub-task Resolved Erik Krogen   Actions
        5.
        Make Client field AlignmentContext non-static. Sub-task Resolved Plamen Jeliazkov   Actions
        6.
        Add stateId to RPC headers. Sub-task Resolved Plamen Jeliazkov   Actions
        7.
        Fine-grained locking while consuming journal stream. Sub-task Resolved Konstantin Shvachko   Actions
        8.
        StandbyNode should upload FsImage to ObserverNode after checkpointing. Sub-task Resolved Chen Liang   Actions
        9.
        Add haadmin commands to transition between standby and observer Sub-task Resolved Chao Sun   Actions
        10.
        Support observer reads for WebHDFS Sub-task Open Chao Sun   Actions
        11.
        Allow Observer to participate in NameNode failover Sub-task Open Unassigned   Actions
        12.
        Standby NameNode should roll active edit log when checkpointing Sub-task Resolved Unassigned   Actions
        13.
        Add lastSeenStateId to RpcRequestHeader. Sub-task Resolved Plamen Jeliazkov   Actions
        14.
        HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients. Sub-task Resolved Simbarashe Dzinamarira

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20h 50m
        Actions
        15.
        Support observer nodes in MiniDFSCluster Sub-task Resolved Konstantin Shvachko   Actions
        16.
        Add ReadOnly annotation to methods in ClientProtocol Sub-task Resolved Chao Sun   Actions
        17.
        [Edit Tail Fast Path Pt 1] Enhance JournalNode with an in-memory cache of recent edit transactions Sub-task Resolved Erik Krogen   Actions
        18.
        [Edit Tail Fast Path Pt 2] Add ability for JournalNode to serve edits via RPC Sub-task Resolved Erik Krogen   Actions
        19.
        [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC Sub-task Resolved Erik Krogen   Actions
        20.
        [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove unnecessary dummy sync Sub-task Resolved Erik Krogen   Actions
        21.
        Move RPC response serialization into Server.doResponse Sub-task Resolved Plamen Jeliazkov   Actions
        22.
        Introduce msync API call Sub-task Resolved Chen Liang   Actions
        23.
        NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode Sub-task Open Unassigned   Actions
        24.
        ClientGCIContext should be correctly named ClientGSIContext Sub-task Resolved Konstantin Shvachko   Actions
        25.
        Use getServiceStatus to discover observer namenodes Sub-task Resolved Chao Sun   Actions
        26.
        Add msync server implementation. Sub-task Resolved Chen Liang   Actions
        27.
        TestStateAlignmentContextWithHA should use real ObserverReadProxyProvider instead of AlignmentContextProxyProvider. Sub-task Resolved Plamen Jeliazkov   Actions
        28.
        Implement performFailover logic for ObserverReadProxyProvider. Sub-task Resolved Erik Krogen   Actions
        29.
        Postpone NameNode state discovery in ObserverReadProxyProvider until the first real RPC call. Sub-task Resolved Chen Liang   Actions
        30.
        Unit tests for standby reads. Sub-task Resolved Unassigned   Actions
        31.
        ObserverReadProxyProvider should work with IPFailoverProxyProvider Sub-task Resolved Konstantin Shvachko   Actions
        32.
        Reduce logging frequency of QuorumJournalManager#selectInputStreams Sub-task Resolved Erik Krogen   Actions
        33.
        Limit logging frequency of edit tail related statements Sub-task Resolved Erik Krogen   Actions
        34.
        Refactor NameNode failover proxy providers Sub-task Resolved Konstantin Shvachko   Actions
        35.
        Remove AlignmentContext from AbstractNNFailoverProxyProvider Sub-task Resolved Konstantin Shvachko   Actions
        36.
        Only some protocol methods should perform msync wait Sub-task Resolved Erik Krogen   Actions
        37.
        ObserverNode should reject read requests when it is too far behind. Sub-task Resolved Konstantin Shvachko   Actions
        38.
        Add mechanism to allow certain RPC calls to bypass sync Sub-task Resolved Chen Liang   Actions
        39.
        Throw retriable exception for getBlockLocations when ObserverNameNode is in safemode Sub-task Resolved Chao Sun   Actions
        40.
        Add a configuration to turn on/off observer reads Sub-task Open Shweta

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        41.
        Handle BlockMissingException when reading from observer Sub-task Resolved Chao Sun   Actions
        42.
        Unit Test for transitioning between different states Sub-task Resolved Sherwood Zheng   Actions
        43.
        Fix crlf line endings in HDFS-12943 branch Sub-task Resolved Konstantin Shvachko   Actions
        44.
        Test reads from standby on a secure cluster with IP failover Sub-task Resolved Chen Liang   Actions
        45.
        TestObserverNode refactoring Sub-task Resolved Konstantin Shvachko   Actions
        46.
        Introduce the single Observer failure Sub-task Resolved Sherwood Zheng   Actions
        47.
        ObserverReadProxyProvider should enable observer read by default Sub-task Resolved Chen Liang   Actions
        48.
        ObserverReadProxyProviderWithIPFailover should work with HA configuration Sub-task Resolved Chen Liang   Actions
        49.
        Emulate Observer node falling far behind the Active Sub-task Resolved Sherwood Zheng   Actions
        50.
        NN status discovery does not leverage delegation token Sub-task Resolved Chen Liang   Actions
        51.
        Test reads from standby on a secure cluster with Configured failover Sub-task Resolved Plamen Jeliazkov   Actions
        52.
        Allow manual failover between standby and observer Sub-task Resolved Chao Sun   Actions
        53.
        Allow manual transition from Standby to Observer Sub-task Resolved Unassigned   Actions
        54.
        Fix the order of logging arguments in ObserverReadProxyProvider. Sub-task Resolved Ayush Saxena   Actions
        55.
        Fix class cast error in NNThroughputBenchmark with ObserverReadProxyProvider. Sub-task Resolved Chao Sun   Actions
        56.
        ORFPP should also clone DT for the virtual IP Sub-task Resolved Chen Liang   Actions
        57.
        Make ZKFC ObserverNode aware Sub-task Resolved xiangheng   Actions
        58.
        Create user guide for "Consistent reads from Observer" feature. Sub-task Resolved Chao Sun   Actions
        59.
        Move ipfailover config key out of HdfsClientConfigKeys Sub-task Resolved Chen Liang   Actions
        60.
        Handle exception from internalQueueCall Sub-task Resolved Chao Sun   Actions
        61.
        Adjust annotations on new interfaces/classes for SBN reads. Sub-task Resolved Chao Sun   Actions
        62.
        Description errors in the comparison logic of transaction ID Sub-task Resolved xiangheng   Actions
        63.
        Update "Consistent Read from Observer" User Guide with Edit Tailing Frequency Sub-task Resolved Erik Krogen   Actions
        64.
        Document dfs.ha.tail-edits.period in user guide. Sub-task Resolved Chao Sun   Actions
        65.
        ObserverReadInvocationHandler should implement RpcInvocationHandler Sub-task Resolved Konstantin Shvachko   Actions
        66.
        Balancer should work with ObserverNode Sub-task Resolved Erik Krogen   Actions
        67.
        Fix white spaces related to SBN reads. Sub-task Resolved Konstantin Shvachko   Actions
        68.
        [SBN read] Unclear Log.WARN message in GlobalStateIdContext Sub-task Resolved Shweta   Actions
        69.
        [SBN Read] StateId and TrasactionId not present in Trace level logging Sub-task Resolved Shweta   Actions
        70.
        Throwing RemoteException in the time of Read Operation Sub-task Resolved Unassigned   Actions
        71.
        [SBN Read] Add the document link to the top page Sub-task Resolved Takanobu Asanuma   Actions
        72.
        [SBN read] Got an unexpected txid when tail editlog Sub-task Resolved Zhaohui Wang   Actions
        73.
        Fix logging error in TestEditLog#testMultiStreamsLoadEditWithConfMaxTxns Sub-task Resolved Jonathan Hung   Actions
        74.
        [SBN read] Change client logging to be less aggressive Sub-task Resolved Chen Liang   Actions
        75.
        [SBN read] StanbyNode does not come out of safemode while adding new blocks. Sub-task Resolved Unassigned   Actions
        76.
        [SBN read] reportBadBlock is rejected by Observer. Sub-task Open Unassigned   Actions
        77.
        [SBN read] Revisit GlobalStateIdContext locking when getting server state id Sub-task Resolved Chen Liang   Actions
        78.
        [SBN read] Allow configurably enable/disable AlignmentContext on NameNode Sub-task Resolved Chen Liang   Actions
        79.
        Prevent Observer NameNode from becoming StandBy NameNode Sub-task Resolved Aihua Xu   Actions
        80.
        RBF: Support observer node from Router-Based Federation Sub-task Resolved Simbarashe Dzinamarira   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shv Konstantin Shvachko
            shv Konstantin Shvachko
            Votes:
            4 Vote for this issue
            Watchers:
            87 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 21.5h
                21.5h

                Slack

                  Issue deployment