Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Invalid
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: None
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      On transition to active, we have to take the FSNS write lock. In EditLogTailer#stop, we interrupt the edit log tailer thread and then join on that thread. When tailing edits, the edit log tailer thread acquires the FSNS write lock interruptibly, precisely so that we avoid deadlocks on transition to active. However, the edit log tailer thread now also triggers edit log rolls. Several places in ipc.Client catch and ignore InterruptedException, and in so doing may cause the Thread#interrupt call to be missed by the edit log tailer thread.

        Issue Links

          Activity

          Hide
          Aaron T. Myers added a comment -

          I see three options:

          1. Make o.a.h.ipc.Client not catch InterruptedException. (Todd mentioned that this is already filed as some trunk JIRA, but I can't find it right now.)
          2. Add a check for shouldRun that breaks out of the loop before acquiring the lock, after the edit log tailer thread triggers a log roll, but before it tries to acquire the FSNS lock.
          3. Move edit log roll triggering to a separate thread.

          Thoughts?

          Show
          Aaron T. Myers added a comment - I see three options: Make o.a.h.ipc.Client not catch InterruptedException . (Todd mentioned that this is already filed as some trunk JIRA, but I can't find it right now.) Add a check for shouldRun that breaks out of the loop before acquiring the lock, after the edit log tailer thread triggers a log roll, but before it tries to acquire the FSNS lock. Move edit log roll triggering to a separate thread. Thoughts?
          Hide
          Aaron T. Myers added a comment -

          I forgot that HDFS-2737 hasn't been committed yet, and this bug is only present in the latest patch for that JIRA. Resolving this and will post an updated patch on HDFS-2737.

          Show
          Aaron T. Myers added a comment - I forgot that HDFS-2737 hasn't been committed yet, and this bug is only present in the latest patch for that JIRA. Resolving this and will post an updated patch on HDFS-2737 .

            People

            • Assignee:
              Aaron T. Myers
              Reporter:
              Aaron T. Myers
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development