Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14378

Simplify the design of multiple NN and both logic of edit log roll and checkpoint

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.2
    • Fix Version/s: None
    • Component/s: ha, namenode
    • Labels:

      Description

            HDFS-6440 introduced a mechanism to support more than 2 NNs. It implements a first-writer-win policy to avoid duplicated fsimage downloading. Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with which SNN will provide fsimage for ANN next time. Then we have three roles in NN cluster: ANN, one primary SNN, one or more normal SNN.

            Since HDFS-12248, there may be more than two primary SNN shortly after a exception occurred. It takes care with a scenario  that SNN will not upload fsimage on IOE and Interrupted exceptions. Though it will not cause any further functional issues, it is inconsistent. 

            Futher more, edit log may be rolled more frequently than necessary with multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will verify by unit tests or any one could point it out.)

            Above all, I‘m wondering if we could make it simple with following changes:

      • There are only two roles:ANN, SNN
      • ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
      • ANN will select a SNN to download checkpoint.

      SNN will just do logtail and checkpoint. Then provide a servlet for fsimage downloading as normal. SNN will not try to roll edit log or send checkpoint request to ANN.

      In a word, ANN will be more active. Suggestions are welcomed.

       

        Attachments

        1. HDFS-14378-trunk.006.patch
          81 kB
          star
        2. HDFS-14378-trunk.005.patch
          81 kB
          star
        3. HDFS-14378-trunk.004.patch
          76 kB
          star
        4. HDFS-14378-trunk.003.patch
          65 kB
          star
        5. HDFS-14378-trunk.002.patch
          65 kB
          star
        6. HDFS-14378-trunk.001.patch
          21 kB
          star

          Issue Links

            Activity

              People

              • Assignee:
                starphin star
                Reporter:
                starphin star
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: