Hadoop Common
  1. Hadoop Common
  2. HADOOP-930

Add support for reading regular (non-block-based) files from S3 in S3FileSystem

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.18.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added support for reading and writing native S3 files. Native S3 files are referenced using s3n URIs. See http://wiki.apache.org/hadoop/AmazonS3 for more details.

      Description

      People often have input data on S3 that they want to use for a Map Reduce job and the current S3FileSystem implementation cannot read it since it assumes a block-based format.

      We would add the following metadata to files written by S3FileSystem: an indication that it is block oriented ("S3FileSystem.type=block") and a filesystem version number ("S3FileSystem.version=1.0"). Regular S3 files would not have the type metadata so S3FileSystem would not try to interpret them as inodes.

      An extension to write regular files to S3 would not be covered by this change - we could do this as a separate piece of work (we still need to decide whether to introduce another scheme - e.g. rename block-based S3 to "s3fs" and call regular S3 "s3" - or whether to just use a configuration property to control block-based vs. regular writes).

      1. hadoop-930.patch
        96 kB
        Tom White
      2. hadoop-930-v2.patch
        98 kB
        Tom White
      3. hadoop-930-v3.patch
        99 kB
        Tom White
      4. hadoop-930-v4.patch
        99 kB
        Tom White
      5. hadoop-930-v5.patch
        99 kB
        Tom White
      6. jets3t-0.6.0.jar
        282 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          Here's a patch for a native S3 filesystem.

          • Writes are supported.
          • The scheme is s3n making it completely independent of the existing block-based S3 filesystem. It might be possible to make a general (read-only) S3 filesystem that can read both types, but I haven't attempted that here (it can go in another Jira if needed).
          • Empty directories are written using the naming convention of appending "_$folder$" to the key. This is the approach taken by S3Fox, and - crucially for efficiency - it makes it possible to tell if a key represents a file or a directory from a list bucket operation.
          • There's a new unit test (FileSystemContractBaseTest) for the contract of FileSystem to ensure that different implementations are consistent. Both S3 filesystems and HDFS are tested using this test. It would be good to add other filesystems later.
          • Renames are not supported as S3 doesn't support them natively (yet). It would be possible to support renames by getting the client to copy the data out of S3 then back again.
          • The Jets3t library has been upgraded to the latest version (0.6.0)
          Show
          Tom White added a comment - Here's a patch for a native S3 filesystem. Writes are supported. The scheme is s3n making it completely independent of the existing block-based S3 filesystem. It might be possible to make a general (read-only) S3 filesystem that can read both types, but I haven't attempted that here (it can go in another Jira if needed). Empty directories are written using the naming convention of appending "_$folder$" to the key. This is the approach taken by S3Fox, and - crucially for efficiency - it makes it possible to tell if a key represents a file or a directory from a list bucket operation. There's a new unit test (FileSystemContractBaseTest) for the contract of FileSystem to ensure that different implementations are consistent. Both S3 filesystems and HDFS are tested using this test. It would be good to add other filesystems later. Renames are not supported as S3 doesn't support them natively (yet). It would be possible to support renames by getting the client to copy the data out of S3 then back again. The Jets3t library has been upgraded to the latest version (0.6.0)
          Hide
          Tom White added a comment -

          Second patch with the following changes:

          • Send Content-MD5 header to perform message integrity checks for writes.
          • Fix warnings from Jets3t to do with not closing streams.
          • Change property names to be independent of existing S3FileSystem: fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey.
          • Findbugs and formatting fixes.
          Show
          Tom White added a comment - Second patch with the following changes: Send Content-MD5 header to perform message integrity checks for writes. Fix warnings from Jets3t to do with not closing streams. Change property names to be independent of existing S3FileSystem: fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey. Findbugs and formatting fixes.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381601/hadoop-930-v2.patch
          against trunk revision 654265.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 159 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 208 release audit warnings (more than the trunk's current 207 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381601/hadoop-930-v2.patch against trunk revision 654265. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 159 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 208 release audit warnings (more than the trunk's current 207 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2420/console This message is automatically generated.
          Hide
          Tom White added a comment -

          Fixed release audit warnings and a few formatting warnings from checkstyle.

          Show
          Tom White added a comment - Fixed release audit warnings and a few formatting warnings from checkstyle.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381659/hadoop-930-v3.patch
          against trunk revision 654315.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 159 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381659/hadoop-930-v3.patch against trunk revision 654315. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 159 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2428/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          Can someone validate that this code works for them?

          Show
          Owen O'Malley added a comment - Can someone validate that this code works for them?
          Hide
          Tom White added a comment -

          Merged with trunk.

          Show
          Tom White added a comment - Merged with trunk.
          Hide
          Chris K Wensel added a comment - - edited

          – Any reason you didn't use the mime type to denote directory files (as jets3t does)?

            public static boolean isDirectory( S3Object object )
              {
              return object.getContentType() != null && object.getContentType().equalsIgnoreCase( MIME_DIRECTORY );
              }
          

          where

           
            public static final String MIME_DIRECTORY = "application/x-directory";
          

          – I believe MD5 checksum should be set on s3 put (via header), and verified on s3 get. I see plenty of read failures because of checksum failures (though they could be side effects of stream reading timeouts in retrospect). This is especially useful if non Hadoop applications are dealing with the S3 data shared with Hadoop.

          – Sometimes 'legacy' buckets have underscores, might consider trying to survive them..

              String userInfo = uri.getUserInfo();
          
              // special handling for underscores in bucket names
              if( userInfo == null )
                {
                String authority = uri.getAuthority();
                String split[] = authority.split( "[:@]" );
          
                if( split.length >= 2 )
                  userInfo = split[ 0 ] + ":" + split[ 1 ];
                }
          

          and

              String bucketName = uri.getAuthority();
          
              // handling for underscore in bucket name
              if( bucketName.contains( "@" ) )
                bucketName = bucketName.split( "@" )[ 1 ];
          
          Show
          Chris K Wensel added a comment - - edited – Any reason you didn't use the mime type to denote directory files (as jets3t does)? public static boolean isDirectory( S3Object object ) { return object.getContentType() != null && object.getContentType().equalsIgnoreCase( MIME_DIRECTORY ); } where public static final String MIME_DIRECTORY = "application/x-directory" ; – I believe MD5 checksum should be set on s3 put (via header), and verified on s3 get. I see plenty of read failures because of checksum failures (though they could be side effects of stream reading timeouts in retrospect). This is especially useful if non Hadoop applications are dealing with the S3 data shared with Hadoop. – Sometimes 'legacy' buckets have underscores, might consider trying to survive them.. String userInfo = uri.getUserInfo(); // special handling for underscores in bucket names if ( userInfo == null ) { String authority = uri.getAuthority(); String split[] = authority.split( "[:@]" ); if ( split.length >= 2 ) userInfo = split[ 0 ] + ":" + split[ 1 ]; } and String bucketName = uri.getAuthority(); // handling for underscore in bucket name if ( bucketName.contains( "@" ) ) bucketName = bucketName.split( "@" )[ 1 ];
          Hide
          Tom White added a comment -

          Thanks for the review Chris.

          Any reason you didn't use the mime type to denote directory files (as jets3t does)?

          It's to do with efficiency of listing directories. If you use mime type then you can't tell the difference between files and directories when listing bucket keys. So you have to query each key in a directory which can be prohibitively slow. But if you use the _$folder$ suffix convention (which S3Fox uses too BTW) you can easily distinguish files and directories.

          I believe MD5 checksum should be set on s3 put (via header), and verified on s3 get.

          The code should be doing this. I agree that it's useful - in fact, the other s3 filesystem needs updating to do this too.

          Sometimes 'legacy' buckets have underscores, might consider trying to survive them.

          Thanks for the tip. The code does detect this condition, but it might be nice to try to workaround as you say (perhaps emitting a warning). Have you done this elsewhere?

          Show
          Tom White added a comment - Thanks for the review Chris. Any reason you didn't use the mime type to denote directory files (as jets3t does)? It's to do with efficiency of listing directories. If you use mime type then you can't tell the difference between files and directories when listing bucket keys. So you have to query each key in a directory which can be prohibitively slow. But if you use the _$folder$ suffix convention (which S3Fox uses too BTW) you can easily distinguish files and directories. I believe MD5 checksum should be set on s3 put (via header), and verified on s3 get. The code should be doing this. I agree that it's useful - in fact, the other s3 filesystem needs updating to do this too. Sometimes 'legacy' buckets have underscores, might consider trying to survive them. Thanks for the tip. The code does detect this condition, but it might be nice to try to workaround as you say (perhaps emitting a warning). Have you done this elsewhere?
          Hide
          Chris K Wensel added a comment -

          It's to do with efficiency of listing directories. If you use mime type then you can't tell the difference between files and directories when listing bucket keys. So you have to query each key in a directory which can be prohibitively slow. But if you use the _$folder$ suffix convention (which S3Fox uses too BTW) you can easily distinguish files and directories.

          From what I can tell, s3service.listObjects returns an array of S3Object, where each instance already has any associated meta-data in a HashMap. Content-Type being one of them. So I think the penalty has been paid.

          Here is the jets3t code.
          https://jets3t.dev.java.net/source/browse/jets3t/src/org/jets3t/service/model/S3Object.java?rev=1.25&view=markup

          are you seeing a different behavior or disabling meta-data in jets3t for performance reasons? Sorry if i seem little rusty on my jets3t api..

          The code should be doing this. I agree that it's useful - in fact, the other s3 filesystem needs updating to do this too.

          Sorry, didn't see where the checksum was being validated on a read. I see it in NativeS3FsOutputStream but not NativeS3FsInputStream. Does Jets3t do this automatically? If so cool.

          Have you done this elsewhere?

          I believe those are the only two values that can be munged due to a underscore in the authority.

          Show
          Chris K Wensel added a comment - It's to do with efficiency of listing directories. If you use mime type then you can't tell the difference between files and directories when listing bucket keys. So you have to query each key in a directory which can be prohibitively slow. But if you use the _$folder$ suffix convention (which S3Fox uses too BTW) you can easily distinguish files and directories. From what I can tell, s3service.listObjects returns an array of S3Object, where each instance already has any associated meta-data in a HashMap. Content-Type being one of them. So I think the penalty has been paid. Here is the jets3t code. https://jets3t.dev.java.net/source/browse/jets3t/src/org/jets3t/service/model/S3Object.java?rev=1.25&view=markup are you seeing a different behavior or disabling meta-data in jets3t for performance reasons? Sorry if i seem little rusty on my jets3t api.. The code should be doing this. I agree that it's useful - in fact, the other s3 filesystem needs updating to do this too. Sorry, didn't see where the checksum was being validated on a read. I see it in NativeS3FsOutputStream but not NativeS3FsInputStream. Does Jets3t do this automatically? If so cool. Have you done this elsewhere? I believe those are the only two values that can be munged due to a underscore in the authority.
          Hide
          Tom White added a comment -

          From what I can tell, s3service.listObjects returns an array of S3Object, where each instance already has any associated meta-data in a HashMap.

          I don't think all the fields in S3Object are populated - just those returned in the list keys response. See http://docs.amazonwebservices.com/AmazonS3/2006-03-01/ListingKeysResponse.html

          I think Jets3t does validate MD5 checksums on reads - but I'll double check.

          Show
          Tom White added a comment - From what I can tell, s3service.listObjects returns an array of S3Object, where each instance already has any associated meta-data in a HashMap. I don't think all the fields in S3Object are populated - just those returned in the list keys response. See http://docs.amazonwebservices.com/AmazonS3/2006-03-01/ListingKeysResponse.html I think Jets3t does validate MD5 checksums on reads - but I'll double check.
          Hide
          Chris K Wensel added a comment -

          I don't think all the fields in S3Object are populated - just those returned in the list keys response.

          good catch.

          Show
          Chris K Wensel added a comment - I don't think all the fields in S3Object are populated - just those returned in the list keys response. good catch.
          Hide
          Tom White added a comment -

          New patch that works with trunk.

          I think Jets3t does validate MD5 checksums on reads - but I'll double check.

          This isn't true, Jets3t doesn't validate MD5 checksums on reads. In fact the stream is sent straight to the client, so it's not possible in general to validate the MD5 checksum - particularly when doing seeks, which use range GETs. Contrast this with S3FileSystem which retrieves data in blocks, so it would be easy to add checksum validate there (I've opened HADOOP-3494 for this). For this issue, I think we should just have write checksum validation.

          I've also created HADOOP-3495 to address supporting underscores in bucket names.

          Show
          Tom White added a comment - New patch that works with trunk. I think Jets3t does validate MD5 checksums on reads - but I'll double check. This isn't true, Jets3t doesn't validate MD5 checksums on reads. In fact the stream is sent straight to the client, so it's not possible in general to validate the MD5 checksum - particularly when doing seeks, which use range GETs. Contrast this with S3FileSystem which retrieves data in blocks, so it would be easy to add checksum validate there (I've opened HADOOP-3494 for this). For this issue, I think we should just have write checksum validation. I've also created HADOOP-3495 to address supporting underscores in bucket names.
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Tom!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Tom!

            People

            • Assignee:
              Tom White
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development