Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18793

S3A StagingCommitter does not clean up staging-uploads directory

    XMLWordPrintableJSON

Details

    Description

      When setting up StagingCommitter and its internal FileOutputCommitter, a temporary directory that holds MPU information will be created on the default FS, which by default is to be /user/${USER}/tmp/staging/${USER}/${UUID}/staging-uploads.

      On a successful job commit, its child directory (_temporary) will be cleaned up properly, but ${UUID}/staging-uploads will remain.

      This will result in having too many empty ${UUID}/staging-uploads directories under /user/${USER}/tmp/staging/${USER}, and will eventually cause an issue in an environment where the max number of items in a directory is capped (e.g. by dfs.namenode.fs-limits.max-directory-items in HDFS).

      The directory item limit of /user/${USER}/tmp/staging/${USER} is exceeded: limit=1048576 items=1048576
      	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1205)
      

      Attachments

        Issue Links

          Activity

            People

              hdaikoku Harunobu Daikoku
              hdaikoku Harunobu Daikoku
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: