Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14511

FSEditlog write both Quorum Journal and Local disk by default in HA using QJM scenario

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • namenode, qjm
    • None

    Description

      Recently, I meet case about FSEditLog in HA using QJM scenario. NameNode enter suspended state and can not process other RPC requests any more.
      The root cause is load of local disk is very high, it will block edit log recored flush local, then following RPC request will occupy all RPC handlers since #FSEditLog write edit log record to both FileJournal (which is local directory located at the same as FsImage) and QuorumJournal in proper order by default and no configuration to switch off FileJournal. However local edit log is not used any time soon.
      More detailed information, the location where edit log write to is decided by configuration items 'dfs.namenode.shared.edits.dir' and 'dfs.namenode.name.dir' (since 'dfs.namenode.edits.dir' is deprecated item, if not set it will be overrided/replaced by 'dfs.namenode.name.dir' where fsimage located.) by default. So JournalSet = QuorumJournal (SharedEditsDirs, set by 'dfs.namenode.shared.edits.dir') + FileJournal (LocalStorageEditsDirs, set by 'dfs.namenode.name.dir' by default). Another side, these two config items have to set in HA with QJM.
      In one word, edit log is double write to both QJM and local disk by default and no way to turn off local write with current implementation. I propose we should offer some choice or turn off local edit log write by default in HA using QJM for users.

      Attachments

        Issue Links

          Activity

            People

              hexiaoqiao Xiaoqiao He
              hexiaoqiao Xiaoqiao He
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: