Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1938

Fork-join job does not execute join node sometimes during HA failover

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: trunk
    • Fix Version/s: None
    • Component/s: HA
    • Labels:
      None

      Description

      Reported by [~mchiang].

      Scenario: (2 Oozie HA servers)
      21:38:56 submit job at oozie client
      21:41:42 shut down server1
      21:46:52 shut down server2
      21:47:30 start server1
      22:15:05 start server2

      the last fork path end time is 21:52:53.
      22:36:48 the job is still RUNNING, not moving to join node.

      Digging into the logs, the locking part seems to work fine with forked action processing distributed amongst the two servers when both running or when one of them is down. The issue seems to be why even RecoveryService fails to pick up the job after all the forks had completed

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              chitnis Mona Chitnis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: