Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1938

Fork-join job does not execute join node sometimes during HA failover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • trunk
    • None
    • HA
    • None

    Description

      Reported by mchiang.

      Scenario: (2 Oozie HA servers)
      21:38:56 submit job at oozie client
      21:41:42 shut down server1
      21:46:52 shut down server2
      21:47:30 start server1
      22:15:05 start server2

      the last fork path end time is 21:52:53.
      22:36:48 the job is still RUNNING, not moving to join node.

      Digging into the logs, the locking part seems to work fine with forked action processing distributed amongst the two servers when both running or when one of them is down. The issue seems to be why even RecoveryService fails to pick up the job after all the forks had completed

      Attachments

        Activity

          People

            Unassigned Unassigned
            chitnis Mona Chitnis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: