Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14028

S3A BlockOutputStreams doesn't delete temporary files in multipart uploads or handle part upload failures

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha4
    • Component/s: fs/s3
    • Labels:
      None
    • Environment:

      JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2

    • Target Version/s:

      Description

      I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I was looking for after running into the same OOM problems) and don't see it cleaning up the disk-cached blocks.

      I'm generating a ~50GB file on an instance with ~6GB free when the process starts. My expectation is that local copies of the blocks would be deleted after those parts finish uploading, but I'm seeing more than 15 blocks in /tmp (and none of them have been deleted thus far).

      I see that DiskBlock deletes temporary files when closed, but is it closed after individual blocks have finished uploading or when the entire file has been fully written to the FS (full upload completed, including all parts)?

      As a temporary workaround to avoid running out of space, I'm listing files, sorting by atime, and deleting anything older than the first 20: `ls -ut | tail -n +21 | xargs rm`

      Steve Loughran says:

      > They should be deleted as soon as the upload completes; the close() call that the AWS httpclient makes on the input stream triggers the deletion. Though there aren't tests for it, as I recall.

        Attachments

        1. HADOOP-14028-branch-2-009.patch
          58 kB
          Steve Loughran
        2. HADOOP-14028-branch-2-008.patch
          58 kB
          Steve Loughran
        3. HADOOP-14028-branch-2-001.patch
          36 kB
          Steve Loughran
        4. HADOOP-14028-branch-2.8-008.patch
          58 kB
          Steve Loughran
        5. HADOOP-14028-branch-2.8-007.patch
          58 kB
          Steve Loughran
        6. HADOOP-14028-branch-2.8-005.patch
          53 kB
          Steve Loughran
        7. HADOOP-14028-branch-2.8-004.patch
          42 kB
          Steve Loughran
        8. HADOOP-14028-branch-2.8-003.patch
          42 kB
          Steve Loughran
        9. HADOOP-14028-branch-2.8-002.patch
          39 kB
          Steve Loughran
        10. HADOOP-14028-007.patch
          58 kB
          Steve Loughran
        11. HADOOP-14028-006.patch
          58 kB
          Steve Loughran

          Issue Links

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                mojodna Seth Fitzsimmons
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: