Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15620 Über-jira: S3A phase VI: Hadoop 3.3 features
  3. HADOOP-15944

S3AInputStream logging to make it easier to debug file leakage

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.1.1
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Problem: if an app opens too many input streams, then all the http connections in the S3A pool can be used up; all attempts to do other FS operations fail timing out for http pool access

      Proposed simple solution: log better what's going on with input stream lifecyce, specifically

      1. include URL of file in open, reopen & close events
      2. maybe: Separate logger for these events, though S3A Input stream should be enough as it doesn't do much else.
      3. maybe: have some prefix in the events like "Lifecycle", so that you could use the existing log @ debug, grep for that phrase and look at the printed URLs to identify what's going on
      4. stream metrics: expose some of the state of the http connection pool and/or active input and output streams

      Idle output streams don't use up http connections, as they only connect during block upload.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: