Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18477 Über-jira: S3A Hadoop 3.3.9 features
  3. HADOOP-15944

S3AInputStream logging to make it easier to debug file leakage

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.1.1
    • None
    • fs/s3
    • None

    Description

      Problem: if an app opens too many input streams, then all the http connections in the S3A pool can be used up; all attempts to do other FS operations fail timing out for http pool access

      Proposed simple solution: log better what's going on with input stream lifecyce, specifically

      1. include URL of file in open, reopen & close events
      2. maybe: Separate logger for these events, though S3A Input stream should be enough as it doesn't do much else.
      3. maybe: have some prefix in the events like "Lifecycle", so that you could use the existing log @ debug, grep for that phrase and look at the printed URLs to identify what's going on
      4. stream metrics: expose some of the state of the http connection pool and/or active input and output streams

      Idle output streams don't use up http connections, as they only connect during block upload.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: