Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2461

Hudson jobs failing because mapred staging directory is full

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 1.0.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      All of the tests that submit MR jobs are failing on the h7 build machine. This is because the staging directory is entirely full:

      hudson@h7:/tmp/mr/mr$ ls -l /tmp/hadoop-hudson/mapred/staging/ | wc -l
      31999

      This makes me think that there's some bug where we're leaking things in the staging directory. I will manually clean this for now, but we should investigate.

        Activity

        Todd Lipcon created issue -
        Hide
        Thomas Weise added a comment -

        We see the same issue with 0.20.204.

        Show
        Thomas Weise added a comment - We see the same issue with 0.20.204.
        Arun C Murthy made changes -
        Field Original Value New Value
        Fix Version/s 0.24.0 [ 12317654 ]
        Fix Version/s 0.23.0 [ 12315570 ]
        Hide
        Luke Lu added a comment -

        We ran into this issue as well. The problem is from the staging area dirs from jobs in local mode (via LocalJobRunner), which creates a staging area dir as <staging_root>/<user><random>/.staging instead of just <staging_root>/<user>/.staging in cluster mode. The issue is introduced with the security releases (since 0.20.20x) when the getStageAreaDir API is introduced.

        The random number for the local mode is presumably used to avoid job collisions, since there is no jobtracker to issue unique job ids.

        Mayb we can introduce a feature (mapreduce.job.staging.keep=<number of latest jobs to keep>) to prune these directories once in a while.

        Show
        Luke Lu added a comment - We ran into this issue as well. The problem is from the staging area dirs from jobs in local mode (via LocalJobRunner), which creates a staging area dir as <staging_root>/<user><random>/.staging instead of just <staging_root>/<user>/.staging in cluster mode. The issue is introduced with the security releases (since 0.20.20x) when the getStageAreaDir API is introduced. The random number for the local mode is presumably used to avoid job collisions, since there is no jobtracker to issue unique job ids. Mayb we can introduce a feature (mapreduce.job.staging.keep=<number of latest jobs to keep>) to prune these directories once in a while.
        Dave Latham made changes -
        Affects Version/s 1.0.2 [ 12320047 ]
        Hide
        Allen Wittenauer added a comment -

        stale

        Show
        Allen Wittenauer added a comment - stale
        Allen Wittenauer made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Allen Wittenauer made changes -
        Fix Version/s 0.24.0 [ 12317654 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Todd Lipcon
          • Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development