Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-19272

S3A: AWS SDK 2.25.53 warnings logged about transfer manager not using CRT client

    XMLWordPrintableJSON

Details

    Description

      When an S3 transfer manager is created for renaming/download a new message is logged telling off the caller for not using the CRT client.

      5645:2024-09-13 16:29:17,375 [setup] WARN  s3.S3TransferManager (LoggerAdapter.java:warn(225)) - The provided S3AsyncClient is an instance of MultipartS3AsyncClient, and thus multipart download feature is not enabled. To benefit from all features, consider using S3AsyncClient.crtBuilder().build() instead.
      

      This is a change in the SDK to tell us developers off -yet it is visible to end users who don't benefit from it and for which it only creates confusion.

      It appears to have been downgraded to debug in the AWS trunk code in PR "S3 Async Client - Multipart download (#5164) -but:

      • it is too late to upgrade and qualify a new version for 3.4.1; downgrading is all we can do
      • there is no guarantee this log message or similar will reoccur.

      Plan
      1. Revert from 3.4.1
      2. lift code from cloudstore library which uses reflection to access and manipulate log4j logs where present
      3. downgrade all transfer manager log levels to NONE.
      4. File an AWS report about how this is an incompatible regression, identify how their process can evolve, particularly in the area of code guidelines about safe logging use.

      I also intend to tighten up our review process to support more rigorous detection of new .warn() messages in the AWS SDK. I'm going to propose that as well as requiring review of our test/CLI output, we require ripgrep scans of .warn(/.error( in SDK source, audit of any new changes. by saving the output of the previous iteration, it'll be straightforward to identify new changes -but not changes in codepaths which change their frequency of appearance.

      I think we should revisit whether or not to move off the xfer manager in the past. We've discussed it in the past, and avoided it just due to maintenance costs. However, it is pushing maintenance costs anyway.

      meanwhile: no new AWS SDK updates until we are confident we have our processes under control.

      Attachments

        1. output.txt
          10 kB
          Steve Loughran

        Issue Links

          Activity

            People

              stevel@apache.org Steve Loughran
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: