Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7609

Avoid retry cache collision when Standby NameNode loading edits

    Details

    • Hadoop Flags:
      Reviewed

      Description

      One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.)
      I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different.
      I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour.

      I think the retry cached is useless during startup, at least during recover process.

        Attachments

        1. HDFS-7609.patch
          21 kB
          Ming Ma
        2. HDFS-7609-2.patch
          27 kB
          Ming Ma
        3. HDFS-7609-3.patch
          22 kB
          Ming Ma
        4. HDFS-7609-branch-2.7.2.txt
          23 kB
          Vinod Kumar Vavilapalli
        5. HDFS-7609-CreateEditsLogWithRPCIDs.patch
          2 kB
          Chris Nauroth
        6. recovery_do_not_use_retrycache.patch
          2 kB
          Carrey Zhan

          Issue Links

            Activity

              People

              • Assignee:
                mingma Ming Ma
                Reporter:
                CarreyZhan Carrey Zhan
              • Votes:
                0 Vote for this issue
                Watchers:
                18 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: