Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13345 S3Guard: Improved Consistency for S3A
  3. HADOOP-13447

Refactor S3AFileSystem to support introduction of separate metadata repository and tests.

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The scope of this issue is to refactor the existing S3AFileSystem into multiple coordinating classes. The goal of this refactoring is to separate the FileSystem API binding from the AWS SDK integration, make code maintenance easier while we're making changes for S3Guard, and make it easier to mock some implementation details so that tests can simulate eventual consistency behavior in a deterministic way.

      1. HADOOP-13447.003.patch
        42 kB
        Chris Nauroth
      2. HADOOP-13447.004.patch
        384 kB
        Chris Nauroth
      3. HADOOP-13447.005.patch
        42 kB
        Chris Nauroth
      4. HADOOP-13447-HADOOP-13446.001.patch
        152 kB
        Chris Nauroth
      5. HADOOP-13447-HADOOP-13446.002.patch
        48 kB
        Chris Nauroth

        Issue Links

          Activity

          Hide
          cnauroth Chris Nauroth added a comment -

          The scope of this patch is intended to be purely refactoring pre-work with no logic changes to S3A. Depending on the final outcome, we may want to consider bringing the refactoring right into the main branches (trunk and branch-2). That will help with code maintenance on the main branches and minimize merge conflicts while merging trunk to the feature branch.

          Show
          cnauroth Chris Nauroth added a comment - The scope of this patch is intended to be purely refactoring pre-work with no logic changes to S3A. Depending on the final outcome, we may want to consider bringing the refactoring right into the main branches (trunk and branch-2). That will help with code maintenance on the main branches and minimize merge conflicts while merging trunk to the feature branch.
          Hide
          cnauroth Chris Nauroth added a comment - - edited

          I'm attaching patch v001 to demonstrate what I have in mind. The test code refactoring in HADOOP-13446 is a pre-requisite for this patch.

          There are at least 2 more things I want to do with this patch before it's ready:

          1. I want to write a true unit test that mocks S3 client interactions, to prove that the patch does in fact set us up to be able to mock the S3 calls effectively (and therefore simulate eventual consistency).
          2. I have introduced a test failure in ITestS3AFileContextStatistics#testStatistics. Root cause is that handling of FileSystem.Statistics through DelegateToFileSystem is a bit funky in terms of scope/lifetime of that stats instance. I haven't found the best fix yet though. All other existing tests are passing.

          Here is a summary of changes broken down by significant classes:

          • S3AFileSystem: This is now a much smaller class. It will be responsible for initializing an S3Store, which encapsulates the S3 calls, and a concrete subclass of AbstractS3AccessPolicy, which will control how client calls coordinate with S3 and optionally other external metadata repositories.
          • S3ClientFactory: This is a factory for construction of the S3 client instance. Note that its return type is defined as AmazonS3 (an interface from the AWS SDK), not AmazonS3Client (the concrete implementation that issues HTTP calls to the S3 back-end). This is the indirection that will allow us to mock the S3 calls. Tests will be able to configure a different factory to return a mock client. The default implementation is DefaultS3ClientFactory, and all pre-existing configuration logic related to the S3 client has moved here.
          • S3Store: Much of the existing code of S3AFileSystem has moved here. This class encapsulates how client calls translate to S3 calls. This layer uses Configuration to lookup the desired S3ClientFactory implementation.
          • AbstractS3AccessPolicy / DirectS3AccessPolicy: Policy classes define how client calls coordinate between S3 calls (the S3Store) and optionally other external metadata repositories. Currently, the only concrete implementation just delegates directly to S3, which provides the same semantics as the existing S3A codebase. The scope of the various "implement access policy" sub-tasks is to implement other sub-classes that provide different semantics: caching, cross-validation for strong consistency, etc.
          Show
          cnauroth Chris Nauroth added a comment - - edited I'm attaching patch v001 to demonstrate what I have in mind. The test code refactoring in HADOOP-13446 is a pre-requisite for this patch. There are at least 2 more things I want to do with this patch before it's ready: I want to write a true unit test that mocks S3 client interactions, to prove that the patch does in fact set us up to be able to mock the S3 calls effectively (and therefore simulate eventual consistency). I have introduced a test failure in ITestS3AFileContextStatistics#testStatistics . Root cause is that handling of FileSystem.Statistics through DelegateToFileSystem is a bit funky in terms of scope/lifetime of that stats instance. I haven't found the best fix yet though. All other existing tests are passing. Here is a summary of changes broken down by significant classes: S3AFileSystem : This is now a much smaller class. It will be responsible for initializing an S3Store , which encapsulates the S3 calls, and a concrete subclass of AbstractS3AccessPolicy , which will control how client calls coordinate with S3 and optionally other external metadata repositories. S3ClientFactory : This is a factory for construction of the S3 client instance. Note that its return type is defined as AmazonS3 (an interface from the AWS SDK), not AmazonS3Client (the concrete implementation that issues HTTP calls to the S3 back-end). This is the indirection that will allow us to mock the S3 calls. Tests will be able to configure a different factory to return a mock client. The default implementation is DefaultS3ClientFactory , and all pre-existing configuration logic related to the S3 client has moved here. S3Store : Much of the existing code of S3AFileSystem has moved here. This class encapsulates how client calls translate to S3 calls. This layer uses Configuration to lookup the desired S3ClientFactory implementation. AbstractS3AccessPolicy / DirectS3AccessPolicy : Policy classes define how client calls coordinate between S3 calls (the S3Store ) and optionally other external metadata repositories. Currently, the only concrete implementation just delegates directly to S3, which provides the same semantics as the existing S3A codebase. The scope of the various "implement access policy" sub-tasks is to implement other sub-classes that provide different semantics: caching, cross-validation for strong consistency, etc.
          Hide
          fabbri Aaron Fabbri added a comment -

          Thanks for the work on this patch Chris Nauroth.

          Looks like the basic approach is to create a wrapper around FileSystem.

          The downsides to this, as we mentioned in the design doc, is that the s3a-internal calls like getFileStatus() cannot utilize the MetadataStore. Seems like this certainly affects performance, and perhaps consistency as well. A smaller negative is that there is a lot of code churn here which makes backports, etc. painful.

          Assuming I'm on the right track here, what should we do to fix this? For the sake of discussion, we could keep a reference to the AccessPolicy in the S3Store. This gives us a nasty circular control flow, though (AccessPolicy calls S3Store, calls AccessPolicy.getFileStatus() etc).

          I feel like a cleaner mapping to the problem is to have the client (S3AFileSystem) contain a MetadataStore and/or some sort of policy object which specifies behavior. Open to other suggestions. There is still a lot of other refactoring that can happen to pare down S3AFileSystem to the core implementation of the top-level FileSystem logic.

          Show
          fabbri Aaron Fabbri added a comment - Thanks for the work on this patch Chris Nauroth . Looks like the basic approach is to create a wrapper around FileSystem . The downsides to this, as we mentioned in the design doc, is that the s3a-internal calls like getFileStatus() cannot utilize the MetadataStore. Seems like this certainly affects performance, and perhaps consistency as well. A smaller negative is that there is a lot of code churn here which makes backports, etc. painful. Assuming I'm on the right track here, what should we do to fix this? For the sake of discussion, we could keep a reference to the AccessPolicy in the S3Store . This gives us a nasty circular control flow, though (AccessPolicy calls S3Store, calls AccessPolicy.getFileStatus() etc). I feel like a cleaner mapping to the problem is to have the client (S3AFileSystem) contain a MetadataStore and/or some sort of policy object which specifies behavior. Open to other suggestions. There is still a lot of other refactoring that can happen to pare down S3AFileSystem to the core implementation of the top-level FileSystem logic.
          Hide
          cnauroth Chris Nauroth added a comment -

          Aaron Fabbri, thank you for looking. Great questions again.

          The downsides to this, as we mentioned in the design doc, is that the s3a-internal calls like getFileStatus() cannot utilize the MetadataStore.

          My eventual (no pun intended!) plan was going to be to evolve the interfaces and separation of responsibilities for AbstractS3AccessPolicy and S3Store such that S3Store never makes its own internal metadata calls (like the internal getFileStatus calls you mentioned). I didn't take that leap in this patch, because it was already quite large.

          Taking the example of create, this might look like the policy layer fetching the FileStatus and then passing it down to S3Store#create, so that S3Store#create doesn't have to do an internal S3 call to fetch it. For the "direct" policy, that would just be a sequence of S3Store#getFileStatus + S3Store#create. For a caching policy, that could be a metadata store fetch instead.

          For some operations, there are greater challenges related to lazy fetch vs. eager fetch for these internal metadata operations. Considering rename, there are multiple (I think 3 now?) getFileStatus calls possible, but fetching them all eagerly in the policy would harm performance if it turns out S3Store#rename doesn't really need to use them all. Working out a lazy fetch strategy will bring some additional complexity into the code, so that's a risk.

          I feel like a cleaner mapping to the problem is to have the client (S3AFileSystem) contain a MetadataStore and/or some sort of policy object which specifies behavior.

          I considered this, but I discarded it, because I thought it would introduce complicated control flow in a lot of the S3AFileSystem methods. However, refactorings like this are always subjective, and it's entirely possible that I was wrong. If you prefer, we can go that way and later revisit the larger split I proposed here in patch 001 if it's deemed necessary. I'm happy to get us rolling either way. Let me know your thoughts.

          Really the only portion of patch 001 that I consider an absolute must is the S3ClientFactory work. I think it's vital to the project that we have the ability to mock S3 interactions to simulate eventual consistency.

          Show
          cnauroth Chris Nauroth added a comment - Aaron Fabbri , thank you for looking. Great questions again. The downsides to this, as we mentioned in the design doc, is that the s3a-internal calls like getFileStatus() cannot utilize the MetadataStore. My eventual (no pun intended!) plan was going to be to evolve the interfaces and separation of responsibilities for AbstractS3AccessPolicy and S3Store such that S3Store never makes its own internal metadata calls (like the internal getFileStatus calls you mentioned). I didn't take that leap in this patch, because it was already quite large. Taking the example of create , this might look like the policy layer fetching the FileStatus and then passing it down to S3Store#create , so that S3Store#create doesn't have to do an internal S3 call to fetch it. For the "direct" policy, that would just be a sequence of S3Store#getFileStatus + S3Store#create . For a caching policy, that could be a metadata store fetch instead. For some operations, there are greater challenges related to lazy fetch vs. eager fetch for these internal metadata operations. Considering rename , there are multiple (I think 3 now?) getFileStatus calls possible, but fetching them all eagerly in the policy would harm performance if it turns out S3Store#rename doesn't really need to use them all. Working out a lazy fetch strategy will bring some additional complexity into the code, so that's a risk. I feel like a cleaner mapping to the problem is to have the client (S3AFileSystem) contain a MetadataStore and/or some sort of policy object which specifies behavior. I considered this, but I discarded it, because I thought it would introduce complicated control flow in a lot of the S3AFileSystem methods. However, refactorings like this are always subjective, and it's entirely possible that I was wrong. If you prefer, we can go that way and later revisit the larger split I proposed here in patch 001 if it's deemed necessary. I'm happy to get us rolling either way. Let me know your thoughts. Really the only portion of patch 001 that I consider an absolute must is the S3ClientFactory work. I think it's vital to the project that we have the ability to mock S3 interactions to simulate eventual consistency.
          Hide
          cnauroth Chris Nauroth added a comment -

          Here is patch v002 to show the smaller possible change that I just discussed for comparison.

          1. Just introduce S3ClientFactory.
          2. Add TestS3AGetFileStatus, which proves that we now can write tests that mock the underlying S3 calls.
          3. Also opportunistically clean up some Checkstyle warnings.
          Show
          cnauroth Chris Nauroth added a comment - Here is patch v002 to show the smaller possible change that I just discussed for comparison. Just introduce S3ClientFactory . Add TestS3AGetFileStatus , which proves that we now can write tests that mock the underlying S3 calls. Also opportunistically clean up some Checkstyle warnings.
          Hide
          fabbri Aaron Fabbri added a comment -

          Great discussion, I'm enjoying it (although an hour with a whiteboard would be even better).

          My eventual (no pun intended!) plan was going to be to evolve the interfaces and separation of responsibilities for AbstractS3AccessPolicy and S3Store such that S3Store never makes its own internal metadata calls (like the internal getFileStatus calls you mentioned).

          Yeah, I was trying to come up with a way to do this. My goals are clean / minimal code, and near-optimal performance (want upstream s3a to be very fast).

          The crux of the problem seems that the top level logic of things like mkdirs()/rename() etc. need to call getFileStatus(), and those internal getFileStatus() calls are subject to the same source-of-truth and retry policies as the public API. In this case, the only way I can think to really separate the top-level logic for FileSystem ops (e.g. mkdir) from the policy on MetadataStore, is to build some sort of execution plan in top level logic then pass to a policy layer to execute it. You'd have three layers, roughly, FileSystem top-level, Raw S3 I/O, Policy / execution. This seems like it would be slower and way over-complicated (some steps in execution are conditional, you end up with a pseudo-language). There is also the AOP approach that s3mper took, but I think we can do better since we can modify the upstream code, and our goals are a bit more ambitious (more than just listStatus(), plus ability to use MetadataStore as source of truth).

          I thought it would introduce complicated control flow in a lot of the S3AFileSystem methods. However, refactorings like this are always subjective, and it's entirely possible that I was wrong.

          This is true. Seems like the classic performance vs. complexity tradeoff. I don't think you are wrong at all, just a question of priorities. I'm willing to sacrifice a little code separation for significant performance benefit, and possibly a more complete solution (e.g. if internal getFileStatus() call misses a file above mkdirs(path) because that file creation was eventually consistent).

          If you prefer, we can go that way

          That is my preference at this stage. It was not as bad as I thought it would be when we prototyped it. Also I like doing refactoring / generalization after seeing some concrete cases in code (i.e. start by complicating top level S3AFileSystem logic).

          My preference would probably change if I could think of a clean way to handle the internal getFileStatus() calls.

          and later revisit the larger split I proposed here in patch 001 if it's deemed necessary.

          I think we do need to break up S3AFileSystem. I'll be very supportive of future refactorization here, in general.

          Really the only portion of patch 001 that I consider an absolute must is the S3ClientFactory work. I think it's vital to the project that we have the ability to mock S3 interactions to simulate eventual consistency.

          Yes, I like this part of it. Being able to mock the S3 service out would be awesome, and I didn't notice any real downside.

          Show
          fabbri Aaron Fabbri added a comment - Great discussion, I'm enjoying it (although an hour with a whiteboard would be even better). My eventual (no pun intended!) plan was going to be to evolve the interfaces and separation of responsibilities for AbstractS3AccessPolicy and S3Store such that S3Store never makes its own internal metadata calls (like the internal getFileStatus calls you mentioned). Yeah, I was trying to come up with a way to do this. My goals are clean / minimal code, and near-optimal performance (want upstream s3a to be very fast). The crux of the problem seems that the top level logic of things like mkdirs()/rename() etc. need to call getFileStatus(), and those internal getFileStatus() calls are subject to the same source-of-truth and retry policies as the public API. In this case, the only way I can think to really separate the top-level logic for FileSystem ops (e.g. mkdir) from the policy on MetadataStore, is to build some sort of execution plan in top level logic then pass to a policy layer to execute it. You'd have three layers, roughly, FileSystem top-level, Raw S3 I/O, Policy / execution. This seems like it would be slower and way over-complicated (some steps in execution are conditional, you end up with a pseudo-language). There is also the AOP approach that s3mper took, but I think we can do better since we can modify the upstream code, and our goals are a bit more ambitious (more than just listStatus(), plus ability to use MetadataStore as source of truth). I thought it would introduce complicated control flow in a lot of the S3AFileSystem methods. However, refactorings like this are always subjective, and it's entirely possible that I was wrong. This is true. Seems like the classic performance vs. complexity tradeoff. I don't think you are wrong at all, just a question of priorities. I'm willing to sacrifice a little code separation for significant performance benefit, and possibly a more complete solution (e.g. if internal getFileStatus() call misses a file above mkdirs(path) because that file creation was eventually consistent). If you prefer, we can go that way That is my preference at this stage. It was not as bad as I thought it would be when we prototyped it. Also I like doing refactoring / generalization after seeing some concrete cases in code (i.e. start by complicating top level S3AFileSystem logic). My preference would probably change if I could think of a clean way to handle the internal getFileStatus() calls. and later revisit the larger split I proposed here in patch 001 if it's deemed necessary. I think we do need to break up S3AFileSystem. I'll be very supportive of future refactorization here, in general. Really the only portion of patch 001 that I consider an absolute must is the S3ClientFactory work. I think it's vital to the project that we have the ability to mock S3 interactions to simulate eventual consistency. Yes, I like this part of it. Being able to mock the S3 service out would be awesome, and I didn't notice any real downside.
          Hide
          cnauroth Chris Nauroth added a comment -

          Great, sounds like we're coming to consensus on something more like patch 002. Let's go ahead with that.

          Show
          cnauroth Chris Nauroth added a comment - Great, sounds like we're coming to consensus on something more like patch 002. Let's go ahead with that.
          Hide
          fabbri Aaron Fabbri added a comment -

          I'm +1 on the 002 patch.

          Noticed one mention of S3Store in a comment, but not worth holding up the patch for.

          Show
          fabbri Aaron Fabbri added a comment - I'm +1 on the 002 patch. Noticed one mention of S3Store in a comment, but not worth holding up the patch for.
          Hide
          cnauroth Chris Nauroth added a comment -

          Steve Loughran, this is another one that I'd like us to consider taking straight to trunk and branch-2, similar to HADOOP-13446. I think HADOOP-13252 should be committed first though, so that the refactoring work here doesn't stomp on your credential provider chain work.

          Show
          cnauroth Chris Nauroth added a comment - Steve Loughran , this is another one that I'd like us to consider taking straight to trunk and branch-2, similar to HADOOP-13446 . I think HADOOP-13252 should be committed first though, so that the refactoring work here doesn't stomp on your credential provider chain work.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I've got the two patches reworked and should be ready to go in after another review or so;

          Show
          stevel@apache.org Steve Loughran added a comment - I've got the two patches reworked and should be ready to go in after another review or so;
          Hide
          cnauroth Chris Nauroth added a comment -

          Patch 003 is a rebase after the most recent S3A commits. This is still dependent on HADOOP-13446 going first as a pre-requisite, so I can't click Submit Patch yet. I verified again that all tests pass successfully against an S3 bucket in US-west-2.

          Steve Loughran and Aaron Fabbri, I have changed target version to 2.9.0. I'd like to proceed with getting this into trunk and branch-2.

          Show
          cnauroth Chris Nauroth added a comment - Patch 003 is a rebase after the most recent S3A commits. This is still dependent on HADOOP-13446 going first as a pre-requisite, so I can't click Submit Patch yet. I verified again that all tests pass successfully against an S3 bucket in US-west-2. Steve Loughran and Aaron Fabbri , I have changed target version to 2.9.0. I'd like to proceed with getting this into trunk and branch-2.
          Hide
          cnauroth Chris Nauroth added a comment -

          Patch 004 is a consolidated patch of HADOOP-13446 + HADOOP-13447, not intended for review, just to give us a preview of the pre-commit results.

          Show
          cnauroth Chris Nauroth added a comment - Patch 004 is a consolidated patch of HADOOP-13446 + HADOOP-13447 , not intended for review, just to give us a preview of the pre-commit results.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 91 new or modified test files.
          0 mvndep 1m 43s Maven dependency ordering for branch
          +1 mvninstall 7m 43s trunk passed
          +1 compile 7m 48s trunk passed
          +1 checkstyle 1m 28s trunk passed
          +1 mvnsite 0m 33s trunk passed
          +1 mvneclipse 1m 2s trunk passed
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-project
          +1 findbugs 0m 29s trunk passed
          +1 javadoc 0m 28s trunk passed
          0 mvndep 0m 18s Maven dependency ordering for patch
          +1 mvninstall 0m 28s the patch passed
          +1 compile 6m 48s the patch passed
          +1 javac 6m 48s the patch passed
          -0 checkstyle 1m 31s root: The patch generated 3 new + 20 unchanged - 61 fixed = 23 total (was 81)
          +1 mvnsite 0m 43s the patch passed
          +1 mvneclipse 0m 32s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-project
          +1 findbugs 0m 38s the patch passed
          +1 javadoc 0m 15s hadoop-project in the patch passed.
          +1 javadoc 0m 19s hadoop-tools_hadoop-aws generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2)
          +1 unit 0m 15s hadoop-project in the patch passed.
          +1 unit 0m 27s hadoop-aws in the patch passed.
          +1 asflicense 0m 27s The patch does not generate ASF License warnings.
          57m 42s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824652/HADOOP-13447.004.patch
          JIRA Issue HADOOP-13447
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle
          uname Linux 212675aefc31 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 763f049
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10316/artifact/patchprocess/diff-checkstyle-root.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10316/testReport/
          modules C: hadoop-project hadoop-tools/hadoop-aws U: .
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10316/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 91 new or modified test files. 0 mvndep 1m 43s Maven dependency ordering for branch +1 mvninstall 7m 43s trunk passed +1 compile 7m 48s trunk passed +1 checkstyle 1m 28s trunk passed +1 mvnsite 0m 33s trunk passed +1 mvneclipse 1m 2s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-project +1 findbugs 0m 29s trunk passed +1 javadoc 0m 28s trunk passed 0 mvndep 0m 18s Maven dependency ordering for patch +1 mvninstall 0m 28s the patch passed +1 compile 6m 48s the patch passed +1 javac 6m 48s the patch passed -0 checkstyle 1m 31s root: The patch generated 3 new + 20 unchanged - 61 fixed = 23 total (was 81) +1 mvnsite 0m 43s the patch passed +1 mvneclipse 0m 32s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-project +1 findbugs 0m 38s the patch passed +1 javadoc 0m 15s hadoop-project in the patch passed. +1 javadoc 0m 19s hadoop-tools_hadoop-aws generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2) +1 unit 0m 15s hadoop-project in the patch passed. +1 unit 0m 27s hadoop-aws in the patch passed. +1 asflicense 0m 27s The patch does not generate ASF License warnings. 57m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824652/HADOOP-13447.004.patch JIRA Issue HADOOP-13447 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle uname Linux 212675aefc31 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 763f049 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10316/artifact/patchprocess/diff-checkstyle-root.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10316/testReport/ modules C: hadoop-project hadoop-tools/hadoop-aws U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10316/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          Checkstyle flagged "3 new" warnings. I do not intend to fix these. One is a long parameter list that isn't really a new warning, but it got flagged because I changed the type of one of the parameters. The others are complaints about exposing protected member variables without wrapper accessor methods in my new abstract base class for mock-based tests. Adding accessor methods would just make subclass code more cumbersome, and since this is private test code, there is no risk of poor encapsulation for any public interface.

          Show
          cnauroth Chris Nauroth added a comment - Checkstyle flagged "3 new" warnings. I do not intend to fix these. One is a long parameter list that isn't really a new warning, but it got flagged because I changed the type of one of the parameters. The others are complaints about exposing protected member variables without wrapper accessor methods in my new abstract base class for mock-based tests. Adding accessor methods would just make subclass code more cumbersome, and since this is private test code, there is no risk of poor encapsulation for any public interface.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          OK

          Show
          stevel@apache.org Steve Loughran added a comment - OK
          Hide
          cnauroth Chris Nauroth added a comment -

          I'm attaching patch 005. This has some small changes in TestS3AGetFileStatus for compatibility with Java 7 so that we can commit the same patch to branch-2. The Collections#emptyList calls need me to pass an explicit type argument when compiling for Java 7.

          I have clicked Cancel Patch for now, because we need to commit HADOOP-13446 first before getting a final pre-commit run on this one.

          Show
          cnauroth Chris Nauroth added a comment - I'm attaching patch 005. This has some small changes in TestS3AGetFileStatus for compatibility with Java 7 so that we can commit the same patch to branch-2. The Collections#emptyList calls need me to pass an explicit type argument when compiling for Java 7. I have clicked Cancel Patch for now, because we need to commit HADOOP-13446 first before getting a final pre-commit run on this one.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 8m 3s trunk passed
          +1 compile 0m 20s trunk passed
          +1 checkstyle 0m 14s trunk passed
          +1 mvnsite 0m 29s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 30s trunk passed
          +1 javadoc 0m 16s trunk passed
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 17s the patch passed
          +1 javac 0m 17s the patch passed
          -0 checkstyle 0m 12s hadoop-tools/hadoop-aws: The patch generated 3 new + 9 unchanged - 11 fixed = 12 total (was 20)
          +1 mvnsite 0m 24s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          +1 findbugs 0m 39s the patch passed
          +1 javadoc 0m 14s hadoop-tools_hadoop-aws generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2)
          +1 unit 0m 21s hadoop-aws in the patch passed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          14m 56s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HADOOP-13447
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824912/HADOOP-13447.005.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle
          uname Linux 7ba4fdfc96a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 23abb09
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10443/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10443/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10443/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 8m 3s trunk passed +1 compile 0m 20s trunk passed +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 30s trunk passed +1 javadoc 0m 16s trunk passed +1 mvninstall 0m 18s the patch passed +1 compile 0m 17s the patch passed +1 javac 0m 17s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-aws: The patch generated 3 new + 9 unchanged - 11 fixed = 12 total (was 20) +1 mvnsite 0m 24s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 0m 39s the patch passed +1 javadoc 0m 14s hadoop-tools_hadoop-aws generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2) +1 unit 0m 21s hadoop-aws in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 14m 56s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HADOOP-13447 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824912/HADOOP-13447.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle uname Linux 7ba4fdfc96a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 23abb09 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10443/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10443/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10443/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          LGTM
          +1

          Show
          stevel@apache.org Steve Loughran added a comment - LGTM +1
          Hide
          cnauroth Chris Nauroth added a comment -

          Aaron Fabbri and Steve Loughran, thank you for the code reviews. I have committed this to trunk and branch-2.

          Show
          cnauroth Chris Nauroth added a comment - Aaron Fabbri and Steve Loughran , thank you for the code reviews. I have committed this to trunk and branch-2.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10398 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10398/)
          HADOOP-13447. Refactor S3AFileSystem to support introduction of separate (cnauroth: rev d152557cf7f4d2288524c222fcbaf152bdc038b0)

          • (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/MockS3ClientFactory.java
          • (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AGetFileStatus.java
          • (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          • (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
          • (add) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ClientFactory.java
          • (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractS3AMockTest.java
          • (edit) hadoop-tools/hadoop-aws/pom.xml
          • (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/S3xLoginHelper.java
          • (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
          • (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AConfiguration.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10398 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10398/ ) HADOOP-13447 . Refactor S3AFileSystem to support introduction of separate (cnauroth: rev d152557cf7f4d2288524c222fcbaf152bdc038b0) (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/MockS3ClientFactory.java (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AGetFileStatus.java (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java (add) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ClientFactory.java (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractS3AMockTest.java (edit) hadoop-tools/hadoop-aws/pom.xml (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/S3xLoginHelper.java (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AConfiguration.java
          Hide
          fabbri Aaron Fabbri added a comment -

          Thanks Chris Nauroth. Should we rebase the feature branch on latest trunk?

          Show
          fabbri Aaron Fabbri added a comment - Thanks Chris Nauroth . Should we rebase the feature branch on latest trunk?
          Hide
          cnauroth Chris Nauroth added a comment -

          Aaron Fabbri, I forgot to mention that I rebased the HADOOP-13345 feature branch right after I committed this.

          Show
          cnauroth Chris Nauroth added a comment - Aaron Fabbri , I forgot to mention that I rebased the HADOOP-13345 feature branch right after I committed this.
          Hide
          cnauroth Chris Nauroth added a comment -

          I'd like to propose merging this change back to branch-2.8 to simplify the process of back-porting patches and avoid merge conflicts. I'll wait a few days for comments in case anyone objects.

          Show
          cnauroth Chris Nauroth added a comment - I'd like to propose merging this change back to branch-2.8 to simplify the process of back-porting patches and avoid merge conflicts. I'll wait a few days for comments in case anyone objects.
          Hide
          fabbri Aaron Fabbri added a comment -

          +1 (non-binding)... IMO branch-2.8 should follow most of s3a development.

          Show
          fabbri Aaron Fabbri added a comment - +1 (non-binding)... IMO branch-2.8 should follow most of s3a development.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          +1 for the restructuring, even if we leave s3guard itself out of s3a in 2.8

          Show
          stevel@apache.org Steve Loughran added a comment - +1 for the restructuring, even if we leave s3guard itself out of s3a in 2.8
          Hide
          cnauroth Chris Nauroth added a comment -

          I cherry-picked this to branch-2.8.

          Show
          cnauroth Chris Nauroth added a comment - I cherry-picked this to branch-2.8.

            People

            • Assignee:
              cnauroth Chris Nauroth
              Reporter:
              cnauroth Chris Nauroth
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development