Details

    • Type: Sub-task
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.8.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Hide
      We cannot guarantee that client-side encryption works in applications which use the Hadoop FileSystem APIs.

      Client-side encryption breaks a fundamental expectation of lots of code: that the amount of data you can read from a file equals the size of the file when listed. Because of padding and roundup, you may get less data on read than you think, seek(length(file)-1) can fail, read(bytes,length(file)) may fail, etc. Be assured, code will break. That's going to be found as you use this feature, and maybe it can be fixed. It may also be that it can't, not easily.

      If you do find problems, you are going to have to take it up with the particular application/library which has the issue, and see whether or not they can/will fix it. We cannot fix it in the Hadoop S3A filesystem, as this code is only reporting back the file lengths supplied by the S3 endpoint: if there is a mismatch, the client code does not know of it. Indeed, it's likely that S3 itself doesn't know of it, because the decryption is taking place on the client.

      For this reason, you cannot simply turn on client-side encryption and expect everything to "just" work. You should restrict it to specific uses which you can test, such as writing data out for use by an external application, or carefully importing data uploaded by other processes.
      Show
      We cannot guarantee that client-side encryption works in applications which use the Hadoop FileSystem APIs. Client-side encryption breaks a fundamental expectation of lots of code: that the amount of data you can read from a file equals the size of the file when listed. Because of padding and roundup, you may get less data on read than you think, seek(length(file)-1) can fail, read(bytes,length(file)) may fail, etc. Be assured, code will break. That's going to be found as you use this feature, and maybe it can be fixed. It may also be that it can't, not easily. If you do find problems, you are going to have to take it up with the particular application/library which has the issue, and see whether or not they can/will fix it. We cannot fix it in the Hadoop S3A filesystem, as this code is only reporting back the file lengths supplied by the S3 endpoint: if there is a mismatch, the client code does not know of it. Indeed, it's likely that S3 itself doesn't know of it, because the decryption is taking place on the client. For this reason, you cannot simply turn on client-side encryption and expect everything to "just" work. You should restrict it to specific uses which you can test, such as writing data out for use by an external application, or carefully importing data uploaded by other processes.

      Description

      Expose the client-side encryption option documented in Amazon S3 documentation - http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html

      Currently this is not exposed in Hadoop but it is exposed as an option in AWS Java SDK, which Hadoop currently includes. It should be trivial to propagate this as a parameter passed to the S3client used in S3AFileSystem.java

        Attachments

        1. S3-CSE Proposal.pdf
          160 kB
          Steve Moist
        2. HADOOP-14171-001.patch
          4 kB
          Steve Loughran
        3. HADOOP-13897-trunk-013.patch
          51 kB
          Igor Mazur
        4. HADOOP-13897-trunk-011.patch
          51 kB
          Igor Mazur
        5. HADOOP-13897-branch-2-014.patch
          51 kB
          Igor Mazur
        6. HADOOP-13897-branch-2-012.patch
          51 kB
          Igor Mazur
        7. HADOOP-13897-branch-2-010.patch
          51 kB
          Igor Mazur
        8. HADOOP-13897-branch-2-009.patch
          49 kB
          Igor Mazur
        9. HADOOP-13897-branch-2-008.patch
          38 kB
          Igor Mazur
        10. HADOOP-13897-branch-2-006.patch
          36 kB
          Igor Mazur
        11. HADOOP-13897-branch-2-005.patch
          36 kB
          Igor Mazur
        12. HADOOP-13897-branch-2-004.patch
          36 kB
          Igor Mazur
        13. HADOOP-13887-branch-2-003.patch
          33 kB
          Igor Mazur
        14. HADOOP-13887-007.patch
          38 kB
          Igor Mazur
        15. HADOOP-13887-002.patch
          34 kB
          Igor Mazur

          Issue Links

            Activity

              People

              • Assignee:
                Igor Mazur Igor Mazur
                Reporter:
                jeeyoungk Jeeyoung Kim
              • Votes:
                2 Vote for this issue
                Watchers:
                17 Start watching this issue

                Dates

                • Created:
                  Updated: