Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2807

Oozie gets RM delegation token even for checking job status

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.0.0b1, 4.3.1
    • None
    • None


      We had one user submitting way too many workflows with single hive query - ~3600 workflows running concurrently. Surprisingly Oozie held up well without issues.
      But Daryn Sharp from our hadoop team saw that the amount of delegation tokens fetched by Oozie was very high compared to actual number of jobs submitted and was stressing RM with the calls and also pushing it close to its memory limits. This is because we are fetching the delegation token every time we create a JobClient instead of only during job submission.


      So for one job we fetch
      1) 1 token during submission
      2) 1 token every 5 minutes when we check status of job
      3) 1 token after the job ends to retrieve status.
      4) 1 token if we are killing the job.

      So for a job running for 11 minutes, we would have fetched the token 4 times. May be more in other cases like mapreduce where we check for end of launcher and child job.

      Only 1 out of the token (used in the job submission) will be cancelled after job completes. Other tokens are kind of leaked and will only be cleaned up by RM after the expiry period (24 hrs is default). This can make RM go out of memory.


        1. OOZIE-2807-1.patch
          3 kB
          Satish Saley
        2. OOZIE-2807-2.patch
          3 kB
          Satish Saley
        3. OOZIE-2807-3.patch
          0.8 kB
          Satish Saley
        4. OOZIE-2807-4.patch
          3 kB
          Satish Saley


          This comment will be Viewable by All Users Viewable by All Users


            satishsaley Satish Saley
            rohini Rohini Palaniswamy
            0 Vote for this issue
            5 Start watching this issue




                Issue deployment