Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Hide
      S3A now includes the current Hadoop version in the User-Agent string passed through the AWS SDK to the S3 service. Users also may include optional additional information to identify their application. See the documentation of configuration property fs.s3a.user.agent.prefix for further details.
      Show
      S3A now includes the current Hadoop version in the User-Agent string passed through the AWS SDK to the S3 service. Users also may include optional additional information to identify their application. See the documentation of configuration property fs.s3a.user.agent.prefix for further details.

      Description

      S3A passes a User-Agent header to the S3 back-end. Right now, it uses the default value set by the AWS SDK, so Hadoop HTTP traffic doesn't appear any different from general AWS SDK traffic. If we customize the User-Agent header, then it will enable better troubleshooting and analysis by AWS or alternative providers of S3-like services.

        Activity

        Hide
        cnauroth Chris Nauroth added a comment -

        I'm attaching patch v001.

        This sets up the AWS SDK ClientConfiguration to prepend the Hadoop version number into the User-Agent. If a configuration property is specified, then it also prepends the property value in front of that. This information is additive, not a full replacement. The information included by the AWS SDK is still present.

        TestS3AConfiguration includes tests with and without the custom prefix. I refactored a bit to share code with other tests that need to use reflection to check AWS SDK internals.

        In addition to the unit tests, I ran manual testing live against S3, with the following in log4j.properties:

        log4j.logger.org.apache.hadoop.fs.s3a=DEBUG
        log4j.logger.org.apache.http=DEBUG
        log4j.logger.org.apache.http.wire=ERROR
        

        I could see the initialization log message announcing what it would use for the User-Agent. Then, the HTTP Components log messages showed that the User-Agent was in fact passing all the way through to the HTTP call:

        Before Patch

        hadoop fs -ls s3a://cnauroth-test-aws-s3a/
        
        User-Agent: aws-sdk-java/1.10.6 Mac_OS_X/10.9.5 Java_HotSpot(TM)_64-Bit_Server_VM/24.65-b04/1.7.0_67
        

        After Patch

        hadoop fs -ls s3a://cnauroth-test-aws-s3a/
        
        User-Agent: Hadoop 3.0.0-SNAPSHOT, aws-sdk-java/1.10.6 Mac_OS_X/10.9.5 Java_HotSpot(TM)_64-Bit_Server_VM/24.65-b04/1.7.0_67
        

        After Patch/Custom Prefix:

        hadoop fs -Dfs.s3a.user.agent.prefix=MyApp -ls s3a://cnauroth-test-aws-s3a/
        
        User-Agent: MyApp, Hadoop 3.0.0-SNAPSHOT, aws-sdk-java/1.10.6 Mac_OS_X/10.9.5 Java_HotSpot(TM)_64-Bit_Server_VM/24.65-b04/1.7.0_67
        
        Show
        cnauroth Chris Nauroth added a comment - I'm attaching patch v001. This sets up the AWS SDK ClientConfiguration to prepend the Hadoop version number into the User-Agent. If a configuration property is specified, then it also prepends the property value in front of that. This information is additive, not a full replacement. The information included by the AWS SDK is still present. TestS3AConfiguration includes tests with and without the custom prefix. I refactored a bit to share code with other tests that need to use reflection to check AWS SDK internals. In addition to the unit tests, I ran manual testing live against S3, with the following in log4j.properties: log4j.logger.org.apache.hadoop.fs.s3a=DEBUG log4j.logger.org.apache.http=DEBUG log4j.logger.org.apache.http.wire=ERROR I could see the initialization log message announcing what it would use for the User-Agent. Then, the HTTP Components log messages showed that the User-Agent was in fact passing all the way through to the HTTP call: Before Patch hadoop fs -ls s3a: //cnauroth-test-aws-s3a/ User-Agent: aws-sdk-java/1.10.6 Mac_OS_X/10.9.5 Java_HotSpot(TM)_64-Bit_Server_VM/24.65-b04/1.7.0_67 After Patch hadoop fs -ls s3a: //cnauroth-test-aws-s3a/ User-Agent: Hadoop 3.0.0-SNAPSHOT, aws-sdk-java/1.10.6 Mac_OS_X/10.9.5 Java_HotSpot(TM)_64-Bit_Server_VM/24.65-b04/1.7.0_67 After Patch/Custom Prefix: hadoop fs -Dfs.s3a.user.agent.prefix=MyApp -ls s3a: //cnauroth-test-aws-s3a/ User-Agent: MyApp, Hadoop 3.0.0-SNAPSHOT, aws-sdk-java/1.10.6 Mac_OS_X/10.9.5 Java_HotSpot(TM)_64-Bit_Server_VM/24.65-b04/1.7.0_67
        Hide
        stevel@apache.org Steve Loughran added a comment -

        We aren't going to be exposing any security/privacy information with this request, are we?

        Show
        stevel@apache.org Steve Loughran added a comment - We aren't going to be exposing any security/privacy information with this request, are we?
        Hide
        cnauroth Chris Nauroth added a comment -

        No, I don't think there is a risk of security exposure. The format of the User-Agent will be <custom prefix>, <Hadoop version>, <SDK info>. The <SDK info> part is controlled completely by the AWS SDK. This is what gets sent today without the patch. The <Hadoop version> is filled in programmatically from the build details embedded in the jar, so I don't expect this would ever contain anything sensitive. I suppose the only problem is if a user willfully set something sensitive into fs.s3a.user.agent.prefix. I wouldn't expect that to happen in practice, but if you feel there is a risk here, then I can add a note in core-default.xml and the docs warning people not to do that. Let me know your thoughts.

        Show
        cnauroth Chris Nauroth added a comment - No, I don't think there is a risk of security exposure. The format of the User-Agent will be <custom prefix>, <Hadoop version>, <SDK info>. The <SDK info> part is controlled completely by the AWS SDK. This is what gets sent today without the patch. The <Hadoop version> is filled in programmatically from the build details embedded in the jar, so I don't expect this would ever contain anything sensitive. I suppose the only problem is if a user willfully set something sensitive into fs.s3a.user.agent.prefix . I wouldn't expect that to happen in practice, but if you feel there is a risk here, then I can add a note in core-default.xml and the docs warning people not to do that. Let me know your thoughts.
        Hide
        stevel@apache.org Steve Loughran added a comment -

        I'm happy then; the only thing we are exposing there is the the java version. From a normal browser that and the flash version are enumerating your vulnerabilities to all. Here: you'd better trust your endpoint, and if you are using the https connection to S3, you get that.

        +1

        Show
        stevel@apache.org Steve Loughran added a comment - I'm happy then; the only thing we are exposing there is the the java version. From a normal browser that and the flash version are enumerating your vulnerabilities to all. Here: you'd better trust your endpoint, and if you are using the https connection to S3, you get that. +1
        Hide
        stevel@apache.org Steve Loughran added a comment -

        applied to branch-2.8+. I'll now need to fix my own patches to merge again

        Show
        stevel@apache.org Steve Loughran added a comment - applied to branch-2.8+. I'll now need to fix my own patches to merge again
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9751 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9751/)
        HADOOP-13122 Customize User-Agent header sent in HTTP requests by S3A. (stevel: rev def2a6d3856452d5c804f04e5bf485541a3bc53a)

        • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AConfiguration.java
        • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
        • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
        • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
        • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9751 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9751/ ) HADOOP-13122 Customize User-Agent header sent in HTTP requests by S3A. (stevel: rev def2a6d3856452d5c804f04e5bf485541a3bc53a) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AConfiguration.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java

          People

          • Assignee:
            cnauroth Chris Nauroth
            Reporter:
            cnauroth Chris Nauroth
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development