Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2807

Oozie gets RM delegation token even for checking job status

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.0.0b1, 4.3.1
    • None
    • None

    Description

      We had one user submitting way too many workflows with single hive query - ~3600 workflows running concurrently. Surprisingly Oozie held up well without issues.
      But daryn from our hadoop team saw that the amount of delegation tokens fetched by Oozie was very high compared to actual number of jobs submitted and was stressing RM with the calls and also pushing it close to its memory limits. This is because we are fetching the delegation token every time we create a JobClient instead of only during job submission.

      https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/service/HadoopAccessorService.java#L503-L519

      So for one job we fetch
      1) 1 token during submission
      2) 1 token every 5 minutes when we check status of job
      3) 1 token after the job ends to retrieve status.
      4) 1 token if we are killing the job.

      So for a job running for 11 minutes, we would have fetched the token 4 times. May be more in other cases like mapreduce where we check for end of launcher and child job.

      Only 1 out of the token (used in the job submission) will be cancelled after job completes. Other tokens are kind of leaked and will only be cleaned up by RM after the expiry period (24 hrs is default). This can make RM go out of memory.

      Attachments

        1. OOZIE-2807-4.patch
          3 kB
          Satish Saley
        2. OOZIE-2807-3.patch
          0.8 kB
          Satish Saley
        3. OOZIE-2807-2.patch
          3 kB
          Satish Saley
        4. OOZIE-2807-1.patch
          3 kB
          Satish Saley

        Activity

          People

            satishsaley Satish Saley
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: