Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13345 S3Guard: Improved Consistency for S3A
  3. HADOOP-14457

create() does not notify metadataStore of parent directories or ensure they're not existing files

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: HADOOP-13345
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Not a great test yet, but it at least reliably demonstrates the issue. LocalMetadataStore will sometimes erroneously report that a directory is empty with isAuthoritative = true when it definitely has children the metadatastore should know about. It doesn't appear to happen if the children are just directory. The fact that it's returning an empty listing is concerning, but the fact that it says it's authoritative might be a second bug.

      diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
      index 78b3970..1821d19 100644
      --- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
      +++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
      @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
         }
       
         @VisibleForTesting
      -  MetadataStore getMetadataStore() {
      +  public MetadataStore getMetadataStore() {
           return metadataStore;
         }
       
      diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
      index 4339649..881bdc9 100644
      --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
      +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
      @@ -23,6 +23,11 @@
       import org.apache.hadoop.fs.contract.AbstractFSContract;
       import org.apache.hadoop.fs.FileSystem;
       import org.apache.hadoop.fs.Path;
      +import org.apache.hadoop.fs.s3a.S3AFileSystem;
      +import org.apache.hadoop.fs.s3a.Tristate;
      +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
      +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
      +import org.junit.Test;
       
       import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
       import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
      @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws Throwable {
           boolean rename = fs.rename(srcDir, destDir);
           assertFalse("s3a doesn't support rename to non-empty directory", rename);
         }
      +
      +  @Test
      +  public void testMkdirPopulatesFileAncestors() throws Exception {
      +    final FileSystem fs = getFileSystem();
      +    final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
      +    final Path parent = path("testMkdirPopulatesFileAncestors/source");
      +    try {
      +      fs.mkdirs(parent);
      +      final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
      +      byte[] srcDataset = dataset(256, 'a', 'z');
      +      writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
      +          1024, false);
      +
      +      DirListingMetadata list = ms.listChildren(parent);
      +      assertTrue("MetadataStore falsely reports authoritative empty list",
      +          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
      +    } finally {
      +      fs.delete(parent, true);
      +    }
      +  }
       }
      
      1. HADOOP-14457-HADOOP-13345.001.patch
        7 kB
        Sean Mackrory
      2. HADOOP-14457-HADOOP-13345.002.patch
        11 kB
        Sean Mackrory
      3. HADOOP-14457-HADOOP-13345.003.patch
        16 kB
        Sean Mackrory
      4. HADOOP-14457-HADOOP-13345.004.patch
        16 kB
        Sean Mackrory
      5. HADOOP-14457-HADOOP-13345.005.patch
        16 kB
        Sean Mackrory
      6. HADOOP-14457-HADOOP-13345.006.patch
        9 kB
        Sean Mackrory
      7. HADOOP-14457-HADOOP-13345.007.patch
        9 kB
        Sean Mackrory
      8. HADOOP-14457-HADOOP-13345.008.patch
        9 kB
        Sean Mackrory
      9. HADOOP-14457-HADOOP-13345.009.patch
        9 kB
        Sean Mackrory
      10. HADOOP-14457-HADOOP-13345.010.patch
        13 kB
        Sean Mackrory

        Issue Links

          Activity

          Hide
          mackrorysd Sean Mackrory added a comment -

          Attaching a patch with a better test and a fix. Was originally looking at this as a bug in the Local implementation for not creating parents, but after a quick discussion about it with @fabbri, I was convinced that that's merely an implementation detail of the DynamoDB implementation, and that it's actually the FS' responsibility to create things it needs to exist (which makes sense, since innerMkdirs does that). I'm more or less doing something similar to innerMkdirs. Maybe they should share a common function here, but there are a few key differences we would need to preserve if we went that route:

          • I think it'd be wasteful to create the empty directory placeholder. Removing it later makes it 2 S3 round trips that aren't needed.
          • I think directories created in this manner should NOT be considered authoritative. ITestS3GuardEmptyDirs agrees with me I think there are a few special cases where we could set it to true here, but in my head that logic gets pretty complex and IMO it's best to just leave it as false if someone is creating the directories implicitly with create().

          There's a second issue addressed here also (since the fix for that issue is inside the loop I would have had to do anyway): one could create /a/b.txt and subsequently /a/b.txt/c/d.txt and nothing would stop you. Obviously not common in practice, but very incorrect IMO. I added a test to the S3A contract test. If this is considered part of the general FS contract we could move it up a level, but I haven't tested anything but S3. It also did occur to me that there could be applications depending on this behavior. So I fixed it while I was in the code, but I'm not entirely convinced about it myself.

          Show
          mackrorysd Sean Mackrory added a comment - Attaching a patch with a better test and a fix. Was originally looking at this as a bug in the Local implementation for not creating parents, but after a quick discussion about it with @fabbri, I was convinced that that's merely an implementation detail of the DynamoDB implementation, and that it's actually the FS' responsibility to create things it needs to exist (which makes sense, since innerMkdirs does that). I'm more or less doing something similar to innerMkdirs. Maybe they should share a common function here, but there are a few key differences we would need to preserve if we went that route: I think it'd be wasteful to create the empty directory placeholder. Removing it later makes it 2 S3 round trips that aren't needed. I think directories created in this manner should NOT be considered authoritative. ITestS3GuardEmptyDirs agrees with me I think there are a few special cases where we could set it to true here, but in my head that logic gets pretty complex and IMO it's best to just leave it as false if someone is creating the directories implicitly with create(). There's a second issue addressed here also (since the fix for that issue is inside the loop I would have had to do anyway): one could create /a/b.txt and subsequently /a/b.txt/c/d.txt and nothing would stop you. Obviously not common in practice, but very incorrect IMO. I added a test to the S3A contract test. If this is considered part of the general FS contract we could move it up a level, but I haven't tested anything but S3. It also did occur to me that there could be applications depending on this behavior. So I fixed it while I was in the code, but I'm not entirely convinced about it myself.
          Hide
          mackrorysd Sean Mackrory added a comment -

          FWIW I tested this with the Local, DynamoDB, and Null implementations, with and without -Dauth. All tests passing except for 2 in ITestS3AContractGetFileStatus (which were failing previously, as reported in HADOOP-13345). That JIRA also reported ITestS3AContractRename.testRenamePopulatesFileAncestors failing with -Dlocal and -Dauth, and that is fixed by this patch (as are a couple of tests I added that would fail without the accompanying fix).

          Show
          mackrorysd Sean Mackrory added a comment - FWIW I tested this with the Local, DynamoDB, and Null implementations, with and without -Dauth. All tests passing except for 2 in ITestS3AContractGetFileStatus (which were failing previously, as reported in HADOOP-13345 ). That JIRA also reported ITestS3AContractRename.testRenamePopulatesFileAncestors failing with -Dlocal and -Dauth, and that is fixed by this patch (as are a couple of tests I added that would fail without the accompanying fix).
          Hide
          fabbri Aaron Fabbri added a comment -

          Good catch on the s3a part of this Sean Mackrory.

          Pinging Steve Loughran: FYI you can create a file underneath a file. Fixing this will add round trips to create() though. This part is orthogonal to S3Guard.

          Show
          fabbri Aaron Fabbri added a comment - Good catch on the s3a part of this Sean Mackrory . Pinging Steve Loughran : FYI you can create a file underneath a file. Fixing this will add round trips to create() though. This part is orthogonal to S3Guard.
          Hide
          fabbri Aaron Fabbri added a comment - - edited

          Sean Mackrory, can you file a separate jira for the s3a create() fix? This JIRA's title doesn't seem to be related. (or, alternately, rename this jira to be clearer)

          Show
          fabbri Aaron Fabbri added a comment - - edited Sean Mackrory , can you file a separate jira for the s3a create() fix? This JIRA's title doesn't seem to be related. (or, alternately, rename this jira to be clearer)
          Hide
          fabbri Aaron Fabbri added a comment - - edited

          First, I'm happy testing authoritative mode w/ LocalMetadataStore exposed a logic bug in the FS client. Very cool.

          Yep Sean Mackrory the main bug here seems to be that S3A does not report the directories it implicitly creates to the metadatastore. We could change MetadataStore to have implicit ancestor creation semantics like FileSystem#create(), but I favor the current semantics, mostly because the FS client already often has to enumerate ancestors (e.g. create and mkdir), and doing it again inside MetadataStore seems wasteful (especially since these will be network round trips, and it sometimes will not be necessary).

          In your test code from JIRA description:

          +      assertTrue("MetadataStore falsely reports authoritative empty list",
          +          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
          

          The check of !list.isAuthoritative() is redundant. FALSE means we know it is empty. If you look at DirListingMetadata.isEmpty() you'll see it is already considered there.

          In your patch:

          +    List<Path> metadataStoreDirs = null;
          +    if (hasMetadataStore()) {
          +      metadataStoreDirs = new ArrayList<>();
          +    }
          +    Path parent = qualify(f.getParent());
          +    while (!parent.isRoot()) {
          +      DirectoryStatus directory = checkPathForDirectory(parent);
          +      if(directory == DirectoryStatus.EXISTS_AND_IS_FILE) {
          +        throw new FileAlreadyExistsException("Parent directory " + parent +
          +            " already exists as a file");
          +      } else if (directory == DirectoryStatus.DOES_NOT_EXIST ||
          +          directory == DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_S3_ONLY) {
          +        if (metadataStoreDirs != null) {
          +          metadataStoreDirs.add(parent);
          +        }
          +      } else if (directory ==
          +          DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_METADATASTORE) {
          +        break;
          +      }
          +      parent = parent.getParent();
          +    }
          +    S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username, false);
          

          I'm wondering, both here and in your recent mkdirs() change, if we cannot simplify this and just use getFileStatus() instead of your checkPathForDirectory()

          Show
          fabbri Aaron Fabbri added a comment - - edited First, I'm happy testing authoritative mode w/ LocalMetadataStore exposed a logic bug in the FS client. Very cool. Yep Sean Mackrory the main bug here seems to be that S3A does not report the directories it implicitly creates to the metadatastore. We could change MetadataStore to have implicit ancestor creation semantics like FileSystem#create() , but I favor the current semantics, mostly because the FS client already often has to enumerate ancestors (e.g. create and mkdir), and doing it again inside MetadataStore seems wasteful (especially since these will be network round trips, and it sometimes will not be necessary). In your test code from JIRA description: + assertTrue("MetadataStore falsely reports authoritative empty list", + list.isEmpty() == Tristate.FALSE || !list.isAuthoritative()); The check of !list.isAuthoritative() is redundant. FALSE means we know it is empty. If you look at DirListingMetadata.isEmpty() you'll see it is already considered there. In your patch: + List<Path> metadataStoreDirs = null; + if (hasMetadataStore()) { + metadataStoreDirs = new ArrayList<>(); + } + Path parent = qualify(f.getParent()); + while (!parent.isRoot()) { + DirectoryStatus directory = checkPathForDirectory(parent); + if(directory == DirectoryStatus.EXISTS_AND_IS_FILE) { + throw new FileAlreadyExistsException("Parent directory " + parent + + " already exists as a file"); + } else if (directory == DirectoryStatus.DOES_NOT_EXIST || + directory == DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_S3_ONLY) { + if (metadataStoreDirs != null) { + metadataStoreDirs.add(parent); + } + } else if (directory == + DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_METADATASTORE) { + break; + } + parent = parent.getParent(); + } + S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username, false); I'm wondering, both here and in your recent mkdirs() change, if we cannot simplify this and just use getFileStatus() instead of your checkPathForDirectory()
          Hide
          mackrorysd Sean Mackrory added a comment -

          Renamed the JIRA, since the actual oversight was very different from what I first suspected. Yes I'd love Steve Loughran's input here - not only regarding the cost of this to create(), but also if this is perhaps worth pushing into all contract tests, or if there's been some FS implementation in which /a/b.txt and /a/b.txt/c/d.txt would actually be considered valid.

          I'm wondering, both here and in your recent mkdirs() change, if we cannot simplify this and just use getFileStatus() instead of your checkPathForDirectory()

          My first concern there is handling for cases where S3 contained data before S3Guard was used, and getFileStatus returns results from S3 causing the creation of those same objects to be skipped on the metadata store.

          Show
          mackrorysd Sean Mackrory added a comment - Renamed the JIRA, since the actual oversight was very different from what I first suspected. Yes I'd love Steve Loughran 's input here - not only regarding the cost of this to create(), but also if this is perhaps worth pushing into all contract tests, or if there's been some FS implementation in which /a/b.txt and /a/b.txt/c/d.txt would actually be considered valid. I'm wondering, both here and in your recent mkdirs() change, if we cannot simplify this and just use getFileStatus() instead of your checkPathForDirectory() My first concern there is handling for cases where S3 contained data before S3Guard was used, and getFileStatus returns results from S3 causing the creation of those same objects to be skipped on the metadata store.
          Hide
          fabbri Aaron Fabbri added a comment -

          My first concern there is handling for cases where S3 contained data before S3Guard was used, and getFileStatus returns results from S3 causing the creation of those same objects to be skipped on the metadata store.

          As long as s3guard is enabled on all clients from some point in time, forward, existing files should be handled as usual: The is empty dir logic and the DirListingMetadata#isAuthoritative() handling should correctly handle partial state.

          Show
          fabbri Aaron Fabbri added a comment - My first concern there is handling for cases where S3 contained data before S3Guard was used, and getFileStatus returns results from S3 causing the creation of those same objects to be skipped on the metadata store. As long as s3guard is enabled on all clients from some point in time, forward, existing files should be handled as usual: The is empty dir logic and the DirListingMetadata#isAuthoritative() handling should correctly handle partial state.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          the fact you can create /a/b.txt/c/d.txt in a call to create() is something I'm aware of, HADOOP-13221 covers it. The current fix would be recursive walk up the tree: very expensive. Nobody has noticed this yet, and I can conclude we don't have contract tests for it.

          the fix I'd really like is to encourage the use of createNonRecursive() or equivalent, which is why I've been pushed for the w-i-p builder() API to file creation to have the nonrecursive create the default action. I suspect most people don't expect a full directory tree to be instantiated on a create(), and if they do, well, a move to a builder API will be a chance to do the mkdir if they really want it

          Show
          stevel@apache.org Steve Loughran added a comment - the fact you can create /a/b.txt/c/d.txt in a call to create() is something I'm aware of, HADOOP-13221 covers it. The current fix would be recursive walk up the tree: very expensive. Nobody has noticed this yet, and I can conclude we don't have contract tests for it. the fix I'd really like is to encourage the use of createNonRecursive() or equivalent, which is why I've been pushed for the w-i-p builder() API to file creation to have the nonrecursive create the default action. I suspect most people don't expect a full directory tree to be instantiated on a create(), and if they do, well, a move to a builder API will be a chance to do the mkdir if they really want it
          Hide
          stevel@apache.org Steve Loughran added a comment -

          -1 as is

          Reviewing the patch, it does fix HADOOP-13221. But it's going to be expensive, where expensive == ~200+mS per parent entry. And as it doesn't appear to break the loop when the situation : dir on s3 & s3guard disabled. On raw S3 that should be marker where you can break successfully, relying on mkdirs() to have done the walk earlier.

          I've actually considered fixing HADOOP-13221 async: you can do the checking before the close(). After all, there's nothing to stop me manipulating the parent directory tree after create() and before close(), and end up with the wrong outcome. But since nobody ever noticed the problem occurring in the wild, I've never been in a rush to hurt performance for a corner case which only arises once you reset your assumptions about files and directories.

          1. for raw-S3, I'd like a full treewalk to be optional. Even checking the parent is expensive. Needs to bail out on that first parent==dir.
          2. tests should be for AbstractContractCreateTest; that way we can see what fails. Add a test for direct parent being a file, one for a couple of entries up. This test can go straight into trunk with any fixes/opt-outs for all the blobstores
          3. The raw S3 tests don't need to be getFileStatus calls. as we are only looking to make sure there aren't parent files (with the absolute paths), not whether they are mock dirs or prefixes with children, That is, for a path "/a/b/c.txt", you only need to do a HEAD /a/b and a HEAD a (actually, you could do them in parallel), and skip the HEAD /a/b/ HEAD /a/, LIST /a/b, LIST /a. This will strip down on the HTTP calls, especially that LIST
          Show
          stevel@apache.org Steve Loughran added a comment - -1 as is Reviewing the patch, it does fix HADOOP-13221 . But it's going to be expensive, where expensive == ~200+mS per parent entry. And as it doesn't appear to break the loop when the situation : dir on s3 & s3guard disabled. On raw S3 that should be marker where you can break successfully, relying on mkdirs() to have done the walk earlier. I've actually considered fixing HADOOP-13221 async: you can do the checking before the close(). After all, there's nothing to stop me manipulating the parent directory tree after create() and before close(), and end up with the wrong outcome. But since nobody ever noticed the problem occurring in the wild, I've never been in a rush to hurt performance for a corner case which only arises once you reset your assumptions about files and directories. for raw-S3, I'd like a full treewalk to be optional. Even checking the parent is expensive. Needs to bail out on that first parent==dir. tests should be for AbstractContractCreateTest ; that way we can see what fails. Add a test for direct parent being a file, one for a couple of entries up. This test can go straight into trunk with any fixes/opt-outs for all the blobstores The raw S3 tests don't need to be getFileStatus calls. as we are only looking to make sure there aren't parent files (with the absolute paths), not whether they are mock dirs or prefixes with children, That is, for a path "/a/b/c.txt", you only need to do a HEAD /a/b and a HEAD a (actually, you could do them in parallel), and skip the HEAD /a/b/ HEAD /a/, LIST /a/b, LIST /a. This will strip down on the HTTP calls, especially that LIST
          Hide
          mackrorysd Sean Mackrory added a comment -

          Thanks for the feedback, Steve Loughran. My immediate concern is just preventing metadata stores from thinking they have an authoritative, empty listing for a directory when it has since had children created. The file-inside-a-file issue was noticed when I was reading what create() already validated and was expedient to fix at the same time - but how about we just put in a metadata-store only fix for this issue, and I'll post my work to HADOOP-13221 as a separate issue.

          Attached a second patch - I also looked into Aaron Fabbri's comments about the need for checkPathForDirectory and convinced myself he is correct. There was actually no need that I see any longer for the changes to innerMkdir in delete tracking.

          Show
          mackrorysd Sean Mackrory added a comment - Thanks for the feedback, Steve Loughran . My immediate concern is just preventing metadata stores from thinking they have an authoritative, empty listing for a directory when it has since had children created. The file-inside-a-file issue was noticed when I was reading what create() already validated and was expedient to fix at the same time - but how about we just put in a metadata-store only fix for this issue, and I'll post my work to HADOOP-13221 as a separate issue. Attached a second patch - I also looked into Aaron Fabbri 's comments about the need for checkPathForDirectory and convinced myself he is correct. There was actually no need that I see any longer for the changes to innerMkdir in delete tracking.
          Hide
          mackrorysd Sean Mackrory added a comment - - edited

          Do also note the bounds-checking I added to Listing.java. I don't know why it suddenly manifests in this patch, but it does indeed seem to have been incorrect: no explicit bounds-checking in next(), so if there is no next it throws ArrayIndexOutOfOBoundsException instead of NoSuchElementException like tests expect.

          Show
          mackrorysd Sean Mackrory added a comment - - edited Do also note the bounds-checking I added to Listing.java. I don't know why it suddenly manifests in this patch, but it does indeed seem to have been incorrect: no explicit bounds-checking in next(), so if there is no next it throws ArrayIndexOutOfOBoundsException instead of NoSuchElementException like tests expect.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          you know we do something in BlockOutputStream when finalizing a write? Or at least I am in the committer branch

          Show
          stevel@apache.org Steve Loughran added a comment - you know we do something in BlockOutputStream when finalizing a write? Or at least I am in the committer branch
          Hide
          stevel@apache.org Steve Loughran added a comment -

          OK, I am effectively seeing this in my committer tests where the file s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS exists, but an attempt to list the parent dir fails as a delete marker is being found instead.

          2017-06-02 19:59:19,791 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path status for s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS  (cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS)
          2017-06-02 19:59:19,791 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS) -> file  s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS 3404    UNKNOWN  false S3AFileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS; isDirectory=false; length=3404; replication=1; blocksize=1048576; modification_time=1496429958524; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE
          2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerListStatus(1660)) - List status for path: s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
          2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path status for s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc  (cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc)
          2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc) -> file  s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc 0       UNKNOWN  true  FileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc; isDirectory=false; length=0; replication=0; blocksize=0; modification_time=1496429951655; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}
          2017-06-02 19:59:19,801 [dispatcher-event-loop-6] INFO  spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(54)) - MapOutputTrackerMasterEndpoint stopped!
          2017-06-02 19:59:19,811 [dispatcher-event-loop-3] INFO  scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(54)) - OutputCommitCoordinator stopped!
          2017-06-02 19:59:19,814 [ScalaTest-main-running-S3ACommitDataframeSuite] INFO  spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped SparkContext
          - Dataframe+partitioned *** FAILED ***
            java.io.FileNotFoundException: Path s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc is recorded as deleted by S3Guard
            at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1906)
            at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1881)
            at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1664)
            at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1640)
            at com.hortonworks.spark.cloud.ObjectStoreOperations$class.validateRowCount(ObjectStoreOperations.scala:340)
            at com.hortonworks.spark.cloud.CloudSuite.validateRowCount(CloudSuite.scala:37)
            at com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite.testOneFormat(S3ACommitDataframeSuite.scala:111)
            at com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite$$anonfun$1$$anonfun$apply$2.apply$mcV$sp(S3ACommitDataframeSuite.scala:71)
            at com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply$mcV$sp(CloudSuiteTrait.scala:66)
            at com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply(CloudSuiteTrait.scala:64)
          

          The _SUCCESS file is just created by a normal create/write/close sequence, so the patch here would fix it. But: other operations are simply completing a multpart PUT call, and not following the same path.

          What everything does do is call S3AFileSystem.finishedWrite(); which is used to set the final status and length. Maybe this is where the final operations should take place, specifically

          1. Normal FS: Delete any/all parent directories
          2. s3guard: add the parent entries.
          Show
          stevel@apache.org Steve Loughran added a comment - OK, I am effectively seeing this in my committer tests where the file s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS exists, but an attempt to list the parent dir fails as a delete marker is being found instead. 2017-06-02 19:59:19,791 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path status for s3a: //hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS (cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS) 2017-06-02 19:59:19,791 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - get(s3a: //hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS) -> file s3a://hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS 3404 UNKNOWN false S3AFileStatus{path=s3a://hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS; isDirectory= false ; length=3404; replication=1; blocksize=1048576; modification_time=1496429958524; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink= false ; hasAcl= false ; isEncrypted= false ; isErasureCoded= false } isEmptyDirectory=FALSE 2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerListStatus(1660)) - List status for path: s3a: //hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc 2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path status for s3a: //hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc (cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc) 2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - get(s3a: //hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc) -> file s3a://hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc 0 UNKNOWN true FileStatus{path=s3a://hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc; isDirectory= false ; length=0; replication=0; blocksize=0; modification_time=1496429951655; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink= false ; hasAcl= false ; isEncrypted= false ; isErasureCoded= false } 2017-06-02 19:59:19,801 [dispatcher-event-loop-6] INFO spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(54)) - MapOutputTrackerMasterEndpoint stopped! 2017-06-02 19:59:19,811 [dispatcher-event-loop-3] INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(54)) - OutputCommitCoordinator stopped! 2017-06-02 19:59:19,814 [ScalaTest-main-running-S3ACommitDataframeSuite] INFO spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped SparkContext - Dataframe+partitioned *** FAILED *** java.io.FileNotFoundException: Path s3a: //hwdev-steve- new /cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc is recorded as deleted by S3Guard at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1906) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1881) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1664) at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1640) at com.hortonworks.spark.cloud.ObjectStoreOperations$class.validateRowCount(ObjectStoreOperations.scala:340) at com.hortonworks.spark.cloud.CloudSuite.validateRowCount(CloudSuite.scala:37) at com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite.testOneFormat(S3ACommitDataframeSuite.scala:111) at com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite$$anonfun$1$$anonfun$apply$2.apply$mcV$sp(S3ACommitDataframeSuite.scala:71) at com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply$mcV$sp(CloudSuiteTrait.scala:66) at com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply(CloudSuiteTrait.scala:64) The _SUCCESS file is just created by a normal create/write/close sequence, so the patch here would fix it. But: other operations are simply completing a multpart PUT call, and not following the same path. What everything does do is call S3AFileSystem.finishedWrite() ; which is used to set the final status and length. Maybe this is where the final operations should take place, specifically Normal FS: Delete any/all parent directories s3guard: add the parent entries.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Update: looked at finishedWrite in more detail. It does

          1. call deleteUnnecessaryFakeDirectories(p.getParent());
          2. if s3guard is enabled, update the metastore with the new value of the file.

          we can/should still have the safety checks in the create call for parents being file, but can mabye postpone the path creation until the file is written (or do it again). FWIW, I'd like the creation code to be kept ouf S3AFS if possible, just because it's getting so big & complex. I've pulled writeOperationsHelper out in the committer branch, but there's still a lot of complexity in the core FS now that everything is metastore-guarded.

          I think we should consider that there's another test missing here: a sequence of:

          1. mkdir(parent)
          2. delete(parent)
          3. touch(child)
          4. stat(child)
          5. ls(parent)

          Similarly, do one for calling create() on a path whose parent hasn't been created and deleted, but simply doesn't exist.

          1. touch(child)
          2. stat(child)
          3. ls(parent)
          Show
          stevel@apache.org Steve Loughran added a comment - Update: looked at finishedWrite in more detail. It does call deleteUnnecessaryFakeDirectories(p.getParent()); if s3guard is enabled, update the metastore with the new value of the file. we can/should still have the safety checks in the create call for parents being file, but can mabye postpone the path creation until the file is written (or do it again). FWIW, I'd like the creation code to be kept ouf S3AFS if possible, just because it's getting so big & complex. I've pulled writeOperationsHelper out in the committer branch, but there's still a lot of complexity in the core FS now that everything is metastore-guarded. I think we should consider that there's another test missing here: a sequence of: mkdir(parent) delete(parent) touch(child) stat(child) ls(parent) Similarly, do one for calling create() on a path whose parent hasn't been created and deleted, but simply doesn't exist. touch(child) stat(child) ls(parent)
          Hide
          fabbri Aaron Fabbri added a comment -

          Thanks for the detail here Steve Loughran.

          FWIW, I'd like the creation code to be kept ouf S3AFS if possible, just because it's getting so big & complex. I've pulled writeOperationsHelper out in the committer branch, but there's still a lot of complexity in the core FS now that everything is metastore-guarded.

          By "creation code", I assume you mean the part where create() results in all ancestor dirs getting created in the MetadataStore. I generally agree.

          Can this live in S3Guard.java, Sean Mackrory, or is it awkward?

          Show
          fabbri Aaron Fabbri added a comment - Thanks for the detail here Steve Loughran . FWIW, I'd like the creation code to be kept ouf S3AFS if possible, just because it's getting so big & complex. I've pulled writeOperationsHelper out in the committer branch, but there's still a lot of complexity in the core FS now that everything is metastore-guarded. By "creation code", I assume you mean the part where create() results in all ancestor dirs getting created in the MetadataStore. I generally agree. Can this live in S3Guard.java, Sean Mackrory , or is it awkward?
          Hide
          mackrorysd Sean Mackrory added a comment -

          I filed HADOOP-14484 for the missing test case (and will fix it if it does indeed fail on Local or Dynamo).

          I'll look at moving this to S3Guard.java - although we should be able to save some operations by solving this problem and the file-as-parent-dir check in the same loop, rather than an S3Guard-specific one in one place and then always another check elsewhere.

          Show
          mackrorysd Sean Mackrory added a comment - I filed HADOOP-14484 for the missing test case (and will fix it if it does indeed fail on Local or Dynamo). I'll look at moving this to S3Guard.java - although we should be able to save some operations by solving this problem and the file-as-parent-dir check in the same loop, rather than an S3Guard-specific one in one place and then always another check elsewhere.
          Hide
          fabbri Aaron Fabbri added a comment -

          If we end up adding an "ancestor is a directory" check to create() in the future, we could accumulate the list of missing parents during the ancestor checks and pass them through the operation to finishedWrite() as a precomputed list of the things to create in the metadatastore. It widens some race conditions around other clients modifying our directory tree, but it seems like it would be optimal WRT round trips. We'd have MetadataStore reads, then writing the outputstream, then close()->finishedWrite() does MetadataStore writes.

          Show
          fabbri Aaron Fabbri added a comment - If we end up adding an "ancestor is a directory" check to create() in the future, we could accumulate the list of missing parents during the ancestor checks and pass them through the operation to finishedWrite() as a precomputed list of the things to create in the metadatastore. It widens some race conditions around other clients modifying our directory tree, but it seems like it would be optimal WRT round trips. We'd have MetadataStore reads, then writing the outputstream, then close()->finishedWrite() does MetadataStore writes.
          Hide
          mackrorysd Sean Mackrory added a comment -

          Attaching a patch that adds all the test cases Steve mentioned (rather than doing HADOOP-14484, I'll just do it here - it was all tested together and this makes it pass with the Local implementation), moves the logic to S3Guard.java and hooks it into finishedWrite.

          It does still include an S3Guard-specific check if the parent is a file. The alternative is to just create ALL parent directories without checking if they exist or not. What do others think about that? I would bet that what I have now is going to do more unnecessary round-trips, but as Aaron Fabbri said we can possibly do a bunch of reads at create() time, and use that to do only the necessary writes at finishedWrite time. For now I feel like this addresses the more serious correctness issues a little more safely, but I'd be open to just creating all parent directories without the sequence of reads.

          I successfully tested the new ContractCreate test on everything but Aliyun, FTP, and Swift. I think it's a much safer bet than the other test I posted to HADOOP-13321.

          Show
          mackrorysd Sean Mackrory added a comment - Attaching a patch that adds all the test cases Steve mentioned (rather than doing HADOOP-14484 , I'll just do it here - it was all tested together and this makes it pass with the Local implementation), moves the logic to S3Guard.java and hooks it into finishedWrite. It does still include an S3Guard-specific check if the parent is a file. The alternative is to just create ALL parent directories without checking if they exist or not. What do others think about that? I would bet that what I have now is going to do more unnecessary round-trips, but as Aaron Fabbri said we can possibly do a bunch of reads at create() time, and use that to do only the necessary writes at finishedWrite time. For now I feel like this addresses the more serious correctness issues a little more safely, but I'd be open to just creating all parent directories without the sequence of reads. I successfully tested the new ContractCreate test on everything but Aliyun, FTP, and Swift. I think it's a much safer bet than the other test I posted to HADOOP-13321 .
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Having a check up the tree during a create is going to be OK For s3guard, as cost of check is low (can we do the entire check as one batch? I don't know the DDB API). its only against raw S3 that the cost becomes excessive enough to make it hard to justify

          prefer the do/while of line 1674 to have the while(fPart != null) at the top

          probably just some personal preference, as I can't see a codepath where the you'd get to FNFE and f.getParent()==null (as it would imply listing the root dir failued), but....

          Show
          stevel@apache.org Steve Loughran added a comment - Having a check up the tree during a create is going to be OK For s3guard, as cost of check is low (can we do the entire check as one batch? I don't know the DDB API). its only against raw S3 that the cost becomes excessive enough to make it hard to justify prefer the do/while of line 1674 to have the while(fPart != null) at the top probably just some personal preference, as I can't see a codepath where the you'd get to FNFE and f.getParent()==null (as it would imply listing the root dir failued), but....
          Hide
          mackrorysd Sean Mackrory added a comment -

          Attaching a patch that switches from do/while to while. I too prefer the while, but was mainly reverting back to the logic innerMkdirs used to have.

          We can do a batch lookup in DynamoDB, but not with the existing MetadataStore interface. The DynamoDB implementation is actually doing a lot of work to maintain the all-nodes-reachable-from-root invariant, and as it turns out the FileSystem is more or less expected to be maintaining that itself anyway. So I'd like to have a larger discussion about eliminating some of similar actions anyway. I'd propose we commit this for correctness now and I'll open a second JIRA for adding a batch lookup (or an idempotent ensure-dirs-exist operation) and investigating eliminating similar functionality from inside the DynamoDB implementation itself.

          Show
          mackrorysd Sean Mackrory added a comment - Attaching a patch that switches from do/while to while. I too prefer the while, but was mainly reverting back to the logic innerMkdirs used to have. We can do a batch lookup in DynamoDB, but not with the existing MetadataStore interface. The DynamoDB implementation is actually doing a lot of work to maintain the all-nodes-reachable-from-root invariant, and as it turns out the FileSystem is more or less expected to be maintaining that itself anyway. So I'd like to have a larger discussion about eliminating some of similar actions anyway. I'd propose we commit this for correctness now and I'll open a second JIRA for adding a batch lookup (or an idempotent ensure-dirs-exist operation) and investigating eliminating similar functionality from inside the DynamoDB implementation itself.
          Hide
          mackrorysd Sean Mackrory added a comment - - edited

          Also, I looked a bit deeper into the failure that is fixed by catching ArrayOutOfBoundsExceptions in ProvidedListStatusIterator.next() and throwing NoSuchElementException instead. ITestS3AContractGetFileStatus fails if -Ds3guard -Dauth is provided (which is what I started doing when I discovered this bug), and has done so since HADOOP-13926. That really just surfaced the existing incorrectness, though. The iterator should not be throwing ArrayOutOfBoundsException when it's out of elements.

          Show
          mackrorysd Sean Mackrory added a comment - - edited Also, I looked a bit deeper into the failure that is fixed by catching ArrayOutOfBoundsExceptions in ProvidedListStatusIterator.next() and throwing NoSuchElementException instead. ITestS3AContractGetFileStatus fails if -Ds3guard -Dauth is provided (which is what I started doing when I discovered this bug), and has done so since HADOOP-13926 . That really just surfaced the existing incorrectness, though. The iterator should not be throwing ArrayOutOfBoundsException when it's out of elements.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 8s HADOOP-14457 does not apply to HADOOP-13345. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue HADOOP-14457
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871609/HADOOP-14457-HADOOP-13345.004.patch
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12455/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 8s HADOOP-14457 does not apply to HADOOP-13345 . Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-14457 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871609/HADOOP-14457-HADOOP-13345.004.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12455/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          mackrorysd Sean Mackrory added a comment -

          Rebasing on the latest merge...

          Show
          mackrorysd Sean Mackrory added a comment - Rebasing on the latest merge...
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          0 mvndep 5m 50s Maven dependency ordering for branch
          +1 mvninstall 15m 12s HADOOP-13345 passed
          +1 compile 15m 40s HADOOP-13345 passed
          +1 checkstyle 1m 58s HADOOP-13345 passed
          +1 mvnsite 1m 42s HADOOP-13345 passed
          -1 findbugs 1m 34s hadoop-common-project/hadoop-common in HADOOP-13345 has 19 extant Findbugs warnings.
          -1 findbugs 0m 39s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings.
          +1 javadoc 1m 18s HADOOP-13345 passed
          0 mvndep 0m 20s Maven dependency ordering for patch
          +1 mvninstall 1m 7s the patch passed
          +1 compile 15m 10s the patch passed
          +1 javac 15m 10s the patch passed
          -0 checkstyle 2m 23s root: The patch generated 5 new + 12 unchanged - 0 fixed = 17 total (was 12)
          +1 mvnsite 2m 0s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 8s the patch passed
          +1 javadoc 1m 23s the patch passed
          -1 unit 9m 13s hadoop-common in the patch failed.
          +1 unit 1m 19s hadoop-aws in the patch passed.
          +1 asflicense 0m 50s The patch does not generate ASF License warnings.
          103m 0s



          Reason Tests
          Failed junit tests hadoop.security.TestRaceWhenRelogin



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14457
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871703/HADOOP-14457-HADOOP-13345.005.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 06d542d404d1 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision HADOOP-13345 / 76b0751
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/diff-checkstyle-root.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. 0 mvndep 5m 50s Maven dependency ordering for branch +1 mvninstall 15m 12s HADOOP-13345 passed +1 compile 15m 40s HADOOP-13345 passed +1 checkstyle 1m 58s HADOOP-13345 passed +1 mvnsite 1m 42s HADOOP-13345 passed -1 findbugs 1m 34s hadoop-common-project/hadoop-common in HADOOP-13345 has 19 extant Findbugs warnings. -1 findbugs 0m 39s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings. +1 javadoc 1m 18s HADOOP-13345 passed 0 mvndep 0m 20s Maven dependency ordering for patch +1 mvninstall 1m 7s the patch passed +1 compile 15m 10s the patch passed +1 javac 15m 10s the patch passed -0 checkstyle 2m 23s root: The patch generated 5 new + 12 unchanged - 0 fixed = 17 total (was 12) +1 mvnsite 2m 0s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 8s the patch passed +1 javadoc 1m 23s the patch passed -1 unit 9m 13s hadoop-common in the patch failed. +1 unit 1m 19s hadoop-aws in the patch passed. +1 asflicense 0m 50s The patch does not generate ASF License warnings. 103m 0s Reason Tests Failed junit tests hadoop.security.TestRaceWhenRelogin Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14457 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871703/HADOOP-14457-HADOOP-13345.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 06d542d404d1 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 76b0751 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12459/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          mackrorysd Sean Mackrory added a comment -

          So none of these are related to this patch. I wonder if we recently updated our findbugs version and it's flagging a bunch of stuff it didn't previously? I filed HADOOP-14499 to take care of the 1 findbugs issue in hadoop-aws.

          Show
          mackrorysd Sean Mackrory added a comment - So none of these are related to this patch. I wonder if we recently updated our findbugs version and it's flagging a bunch of stuff it didn't previously? I filed HADOOP-14499 to take care of the 1 findbugs issue in hadoop-aws.
          Hide
          fabbri Aaron Fabbri added a comment -

          Sean Mackrory how do you feel about splitting this patch up into multiple jiras? I'm happy to help.

          1. Add new create contract test that validates ancestor directory creation.
          2. ProvidedFileStatusIterator#next() may throw IndexOutOfBoundsException
          3. Simplify mkdir()
          4. S3Guard: populate ancestors in create() since S3AFileSystem doesn't do it.

          Show
          fabbri Aaron Fabbri added a comment - Sean Mackrory how do you feel about splitting this patch up into multiple jiras? I'm happy to help. 1. Add new create contract test that validates ancestor directory creation. 2. ProvidedFileStatusIterator#next() may throw IndexOutOfBoundsException 3. Simplify mkdir() 4. S3Guard: populate ancestors in create() since S3AFileSystem doesn't do it.
          Hide
          mackrorysd Sean Mackrory added a comment -

          +1 - works for me. I'm stepping out shortly, but can split that up tonight unless you want to do it sooner.

          Show
          mackrorysd Sean Mackrory added a comment - +1 - works for me. I'm stepping out shortly, but can split that up tonight unless you want to do it sooner.
          Hide
          fabbri Aaron Fabbri added a comment -

          Ok.. I created the following JIRAs for the items I list in my previous comment above:

          1. HADOOP-14506
          2. HADOOP-14504 (added unit test and posted patch)
          3. HADOOP-14505
          4. This JIRA

          Show
          fabbri Aaron Fabbri added a comment - Ok.. I created the following JIRAs for the items I list in my previous comment above: 1. HADOOP-14506 2. HADOOP-14504 (added unit test and posted patch) 3. HADOOP-14505 4. This JIRA
          Hide
          mackrorysd Sean Mackrory added a comment -

          There's a bit of overlap between this and HADOOP-14505, so doing this one first, and HADOOP-14505 will need to depend on this. Attaching a patch with the other parts removed. Re-tested against us-west-2 with every valid combo of -Ds3guard -Dauth and -Ddynamo.

          Show
          mackrorysd Sean Mackrory added a comment - There's a bit of overlap between this and HADOOP-14505 , so doing this one first, and HADOOP-14505 will need to depend on this. Attaching a patch with the other parts removed. Re-tested against us-west-2 with every valid combo of -Ds3guard -Dauth and -Ddynamo.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 14m 3s HADOOP-13345 passed
          +1 compile 0m 23s HADOOP-13345 passed
          +1 checkstyle 0m 14s HADOOP-13345 passed
          +1 mvnsite 0m 24s HADOOP-13345 passed
          -1 findbugs 0m 31s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings.
          +1 javadoc 0m 14s HADOOP-13345 passed
          +1 mvninstall 0m 20s the patch passed
          +1 compile 0m 18s the patch passed
          +1 javac 0m 18s the patch passed
          -0 checkstyle 0m 12s hadoop-tools/hadoop-aws: The patch generated 3 new + 7 unchanged - 0 fixed = 10 total (was 7)
          +1 mvnsite 0m 21s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 36s the patch passed
          +1 javadoc 0m 12s the patch passed
          +1 unit 0m 40s hadoop-aws in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          20m 19s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14457
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872081/HADOOP-14457-HADOOP-13345.006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 3392d8cbf483 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision HADOOP-13345 / 6f6a61b
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 14m 3s HADOOP-13345 passed +1 compile 0m 23s HADOOP-13345 passed +1 checkstyle 0m 14s HADOOP-13345 passed +1 mvnsite 0m 24s HADOOP-13345 passed -1 findbugs 0m 31s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings. +1 javadoc 0m 14s HADOOP-13345 passed +1 mvninstall 0m 20s the patch passed +1 compile 0m 18s the patch passed +1 javac 0m 18s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-aws: The patch generated 3 new + 7 unchanged - 0 fixed = 10 total (was 7) +1 mvnsite 0m 21s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 36s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 0m 40s hadoop-aws in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 20m 19s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14457 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872081/HADOOP-14457-HADOOP-13345.006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3392d8cbf483 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 6f6a61b Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12481/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          mackrorysd Sean Mackrory added a comment -

          This needed a minor tweak to be rebase on top of some other work that went in earlier today. This includes a direct test of the correct create() behavior in the MetadataStore, and also a test that create() over the top of tombstones doesn't have any other issues.

          Show
          mackrorysd Sean Mackrory added a comment - This needed a minor tweak to be rebase on top of some other work that went in earlier today. This includes a direct test of the correct create() behavior in the MetadataStore, and also a test that create() over the top of tombstones doesn't have any other issues.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 23s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 14m 13s HADOOP-13345 passed
          +1 compile 0m 22s HADOOP-13345 passed
          +1 checkstyle 0m 16s HADOOP-13345 passed
          +1 mvnsite 0m 24s HADOOP-13345 passed
          -1 findbugs 0m 32s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings.
          +1 javadoc 0m 16s HADOOP-13345 passed
          +1 mvninstall 0m 20s the patch passed
          +1 compile 0m 19s the patch passed
          +1 javac 0m 19s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-aws: The patch generated 3 new + 7 unchanged - 0 fixed = 10 total (was 7)
          +1 mvnsite 0m 22s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 34s the patch passed
          +1 javadoc 0m 12s the patch passed
          +1 unit 0m 39s hadoop-aws in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          20m 43s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14457
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872162/HADOOP-14457-HADOOP-13345.007.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux aa2f0414d8c1 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision HADOOP-13345 / 30baa08
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 23s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 14m 13s HADOOP-13345 passed +1 compile 0m 22s HADOOP-13345 passed +1 checkstyle 0m 16s HADOOP-13345 passed +1 mvnsite 0m 24s HADOOP-13345 passed -1 findbugs 0m 32s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings. +1 javadoc 0m 16s HADOOP-13345 passed +1 mvninstall 0m 20s the patch passed +1 compile 0m 19s the patch passed +1 javac 0m 19s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-aws: The patch generated 3 new + 7 unchanged - 0 fixed = 10 total (was 7) +1 mvnsite 0m 22s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 34s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 0m 39s hadoop-aws in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 20m 43s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14457 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872162/HADOOP-14457-HADOOP-13345.007.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux aa2f0414d8c1 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 30baa08 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12488/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          fabbri Aaron Fabbri added a comment -

          Thank you for being flexible and splitting this up Sean Mackrory, it is much easier to digest.

          In S3Guard#addAncestors(), which is called from S3AFileSystem#finishedWrite():

          +      PathMetadata directory = metadataStore.get(parent);
          +      if (directory == null || directory.isDeleted()) {
          +        metadataStoreDirs.add(parent);
          +      } else {
          +        if (directory.getFileStatus().isDirectory()) {
          +          break;
          +        } else {
          +          throw new FileAlreadyExistsException("Parent directory " + parent +
          +              " already exists as a file");
          +        }
          

          I would probably just make this whole else block do a break, and remove the inner if/else that checks if ancestor is a directory. A couple of reasons I think enforcing that ancestors have to be directories here is bad:

          1. It is a fail on close(), after data has been written. This is the big one: failure should mean the file was not materialized.
          2. It only will fail when S3Guard is enabled and we happen to have metadata loaded for the ancestor file. So the enforcement is situational.
          3. The FS client should be doing this, if it is done at all, IMO.

          I can't think of any issues this causes with MetadataStore implementations, if we just break here and ignoring file ancestors. Shout if you think otherwise.

          Also, for the createNonRecursive() case, this work should not be necessary. We'd need some operation context / plumbing to know in finishedWrite() though (I think). I was already thinking of plumbing some more state through the operation (to record stats on inconsistency, a la HADOOP-14467 and HADOOP-14468), so we can address this part with a followup JIRA assigned to me.

          For the new test case testCreatePopulatesFileAncestors(), can you move it to another test class? I recently moved some unrelated tests out of this class and clarified its purpose in the class comment. I would give your case its own class, actually, with a class comment to the effect of

          /**
           * Test that ancestor creation during S3AFileSystem#create() is properly accounted 
           * for in the MetadataStore.  This should be handled by the FileSystem, and be a FS contract
           * test, but S3A does not handle ancestors on create(), so we need to take care in the S3Guard 
           * code to do the right thing.  This may change: See HADOOP-13221 for more detail.
           */
          

          About the new boolean parameter here:

          +   * @param authoritative Whether to mark new directories as authoritative.
              */
             public static void makeDirsOrdered(MetadataStore ms, List<Path> dirs,
          -      String owner) {
          +      String owner, boolean authoritative) {
          

          This makes sense. An unfortunate side effect of the FS not enforcing this stuff. Usually if we're making directories, we by definition know their full contents (leaf is empty, ancestors all have one entry for child). In this case we do not know if the directories already exist on the underlying FS, so we have to mark them as "fully cached: unknown". Ok.

          +      assertTrue("MetadataStore falsely reports authoritative empty list",
          +          list.isEmpty() == Tristate.FALSE);
          

          This should pass with current implementations, but {{ != Tristate.TRUE }} is more precise. A MetadataStore is not required to know directory emptiness.

          Show
          fabbri Aaron Fabbri added a comment - Thank you for being flexible and splitting this up Sean Mackrory , it is much easier to digest. In S3Guard#addAncestors() , which is called from S3AFileSystem#finishedWrite() : + PathMetadata directory = metadataStore.get(parent); + if (directory == null || directory.isDeleted()) { + metadataStoreDirs.add(parent); + } else { + if (directory.getFileStatus().isDirectory()) { + break ; + } else { + throw new FileAlreadyExistsException( "Parent directory " + parent + + " already exists as a file" ); + } I would probably just make this whole else block do a break , and remove the inner if/else that checks if ancestor is a directory. A couple of reasons I think enforcing that ancestors have to be directories here is bad: 1. It is a fail on close(), after data has been written. This is the big one: failure should mean the file was not materialized. 2. It only will fail when S3Guard is enabled and we happen to have metadata loaded for the ancestor file. So the enforcement is situational. 3. The FS client should be doing this, if it is done at all, IMO. I can't think of any issues this causes with MetadataStore implementations, if we just break here and ignoring file ancestors. Shout if you think otherwise. Also, for the createNonRecursive() case, this work should not be necessary. We'd need some operation context / plumbing to know in finishedWrite() though (I think). I was already thinking of plumbing some more state through the operation (to record stats on inconsistency, a la HADOOP-14467 and HADOOP-14468 ), so we can address this part with a followup JIRA assigned to me. For the new test case testCreatePopulatesFileAncestors() , can you move it to another test class? I recently moved some unrelated tests out of this class and clarified its purpose in the class comment. I would give your case its own class, actually, with a class comment to the effect of /** * Test that ancestor creation during S3AFileSystem#create() is properly accounted * for in the MetadataStore. This should be handled by the FileSystem, and be a FS contract * test, but S3A does not handle ancestors on create(), so we need to take care in the S3Guard * code to do the right thing. This may change: See HADOOP-13221 for more detail. */ About the new boolean parameter here: + * @param authoritative Whether to mark new directories as authoritative. */ public static void makeDirsOrdered(MetadataStore ms, List<Path> dirs, - String owner) { + String owner, boolean authoritative) { This makes sense. An unfortunate side effect of the FS not enforcing this stuff. Usually if we're making directories, we by definition know their full contents (leaf is empty, ancestors all have one entry for child). In this case we do not know if the directories already exist on the underlying FS, so we have to mark them as "fully cached: unknown". Ok. + assertTrue( "MetadataStore falsely reports authoritative empty list" , + list.isEmpty() == Tristate.FALSE); This should pass with current implementations, but {{ != Tristate.TRUE }} is more precise. A MetadataStore is not required to know directory emptiness.
          Hide
          mackrorysd Sean Mackrory added a comment -

          Yeah I was on the fence about doing the parent-is-a-file check - I agree with all your reasons, so it's removed. If there's a file as a parent I do think it has the potential to cause issues with MetadataStore implementations as that violates the invariant that everything is reachable by traversing from root. Not sure what operations that screws up, though.

          Created a new test class for testCreatePopulatesFileAncestors. I'm actually now thinking we could combine the 2 new test classes - they both test create functionality - one checking ancestors, one checking the same thing but overwriting tombstones.

          Retested parallel tests with no options, and with "-Ddynamo -Ds3guard -Dauth".

          Show
          mackrorysd Sean Mackrory added a comment - Yeah I was on the fence about doing the parent-is-a-file check - I agree with all your reasons, so it's removed. If there's a file as a parent I do think it has the potential to cause issues with MetadataStore implementations as that violates the invariant that everything is reachable by traversing from root. Not sure what operations that screws up, though. Created a new test class for testCreatePopulatesFileAncestors. I'm actually now thinking we could combine the 2 new test classes - they both test create functionality - one checking ancestors, one checking the same thing but overwriting tombstones. Retested parallel tests with no options, and with "-Ddynamo -Ds3guard -Dauth".
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 15m 24s HADOOP-13345 passed
          +1 compile 0m 25s HADOOP-13345 passed
          +1 checkstyle 0m 16s HADOOP-13345 passed
          +1 mvnsite 0m 28s HADOOP-13345 passed
          -1 findbugs 0m 36s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings.
          +1 javadoc 0m 16s HADOOP-13345 passed
          +1 mvninstall 0m 24s the patch passed
          +1 compile 0m 22s the patch passed
          +1 javac 0m 22s the patch passed
          -0 checkstyle 0m 14s hadoop-tools/hadoop-aws: The patch generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7)
          +1 mvnsite 0m 26s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 44s the patch passed
          +1 javadoc 0m 14s the patch passed
          +1 unit 0m 46s hadoop-aws in the patch passed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          22m 31s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14457
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872260/HADOOP-14457-HADOOP-13345.008.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 13996aa2862a 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision HADOOP-13345 / 6a06ed8
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 15m 24s HADOOP-13345 passed +1 compile 0m 25s HADOOP-13345 passed +1 checkstyle 0m 16s HADOOP-13345 passed +1 mvnsite 0m 28s HADOOP-13345 passed -1 findbugs 0m 36s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings. +1 javadoc 0m 16s HADOOP-13345 passed +1 mvninstall 0m 24s the patch passed +1 compile 0m 22s the patch passed +1 javac 0m 22s the patch passed -0 checkstyle 0m 14s hadoop-tools/hadoop-aws: The patch generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7) +1 mvnsite 0m 26s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 44s the patch passed +1 javadoc 0m 14s the patch passed +1 unit 0m 46s hadoop-aws in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 22m 31s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14457 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872260/HADOOP-14457-HADOOP-13345.008.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 13996aa2862a 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 6a06ed8 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12498/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 14m 13s HADOOP-13345 passed
          +1 compile 0m 22s HADOOP-13345 passed
          +1 checkstyle 0m 14s HADOOP-13345 passed
          +1 mvnsite 0m 24s HADOOP-13345 passed
          -1 findbugs 0m 29s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings.
          +1 javadoc 0m 15s HADOOP-13345 passed
          +1 mvninstall 0m 20s the patch passed
          +1 compile 0m 19s the patch passed
          +1 javac 0m 19s the patch passed
          +1 checkstyle 0m 11s the patch passed
          +1 mvnsite 0m 22s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 34s the patch passed
          +1 javadoc 0m 12s the patch passed
          +1 unit 0m 39s hadoop-aws in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          20m 23s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14457
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872269/HADOOP-14457-HADOOP-13345.009.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4bc9573eb806 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision HADOOP-13345 / 6a06ed8
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12499/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12499/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12499/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 14m 13s HADOOP-13345 passed +1 compile 0m 22s HADOOP-13345 passed +1 checkstyle 0m 14s HADOOP-13345 passed +1 mvnsite 0m 24s HADOOP-13345 passed -1 findbugs 0m 29s hadoop-tools/hadoop-aws in HADOOP-13345 has 1 extant Findbugs warnings. +1 javadoc 0m 15s HADOOP-13345 passed +1 mvninstall 0m 20s the patch passed +1 compile 0m 19s the patch passed +1 javac 0m 19s the patch passed +1 checkstyle 0m 11s the patch passed +1 mvnsite 0m 22s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 34s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 0m 39s hadoop-aws in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 20m 23s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14457 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872269/HADOOP-14457-HADOOP-13345.009.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4bc9573eb806 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 6a06ed8 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12499/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-aws-warnings.html Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12499/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12499/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          fabbri Aaron Fabbri added a comment -

          Thank you for updating your patch Sean Mackrory. The v9 code looks good.

          I would +1 this except one concern, which I should have mentioned earlier had it occurred to me: This will likely have a negative performance impact for S3Guard w/ Dynamo. Correct me if I am wrong, but the main purpose of this code is to fix the fact that S3A's "broken but fast" create() implementation breaks authoritative (fully-cached) directory listings for the MetadataStore (since the S3A client is not reporting directory creations which impact said authoritative listings of ancestors).

          In terms of performance with the DynamoDBMetadataStore, however, this code is bad for two reasons:
          1. DynamoDBMetadataStore doesn't implement authoritative listings.
          2. DynamoDBMetadataStore already populates ancestors due to internal implementation details.

          I do think authoritative listing is valuable though. Not only for future performance gains we can get by short-circuiting S3 list, but for the extra testing and logic checks we get from having the LocalMetadataStore and associated contract tests around it.

          I wonder if this is the time to introduce a capabilities query interface on MetadataStore. Then we could rename the function to S3Guard#addAncestorsIfAuthoritative(..) and have it look like this:

            /** Add ancestors of qualifiedPath to MetadataStore iff it supports authoritative listings.*/
            public static void addAncestorsIfAuthoritative(MetadataStore metadataStore,
                Path qualifiedPath, String username) throws IOException {
              if (! metadataStore.getOption(SUPPORTS_AUTHORITATIVE_DIRS)) {
                return;
              }
            ...
          

          I also like the capabilities query idea because it allows us to write stricter MetadataStore contract tests.

          Thanks for your patience on the back and forth on this. I'd be happy to +1 this and follow up quickly with a patch that adds the capability stuff + the new if() condition above, if you and Steve Loughran or Mingliang Liu agree with my comments here.

          Show
          fabbri Aaron Fabbri added a comment - Thank you for updating your patch Sean Mackrory . The v9 code looks good. I would +1 this except one concern, which I should have mentioned earlier had it occurred to me: This will likely have a negative performance impact for S3Guard w/ Dynamo. Correct me if I am wrong, but the main purpose of this code is to fix the fact that S3A's "broken but fast" create() implementation breaks authoritative (fully-cached) directory listings for the MetadataStore (since the S3A client is not reporting directory creations which impact said authoritative listings of ancestors). In terms of performance with the DynamoDBMetadataStore, however, this code is bad for two reasons: 1. DynamoDBMetadataStore doesn't implement authoritative listings. 2. DynamoDBMetadataStore already populates ancestors due to internal implementation details. I do think authoritative listing is valuable though. Not only for future performance gains we can get by short-circuiting S3 list, but for the extra testing and logic checks we get from having the LocalMetadataStore and associated contract tests around it. I wonder if this is the time to introduce a capabilities query interface on MetadataStore. Then we could rename the function to S3Guard#addAncestorsIfAuthoritative(..) and have it look like this: /** Add ancestors of qualifiedPath to MetadataStore iff it supports authoritative listings.*/ public static void addAncestorsIfAuthoritative(MetadataStore metadataStore, Path qualifiedPath, String username) throws IOException { if (! metadataStore.getOption(SUPPORTS_AUTHORITATIVE_DIRS)) { return ; } ... I also like the capabilities query idea because it allows us to write stricter MetadataStore contract tests. Thanks for your patience on the back and forth on this. I'd be happy to +1 this and follow up quickly with a patch that adds the capability stuff + the new if() condition above, if you and Steve Loughran or Mingliang Liu agree with my comments here.
          Hide
          mackrorysd Sean Mackrory added a comment -

          A capabilities-based approach seems to me to be far more complicated than just deciding whether it is or isn't a MetadataStore implementations job to track parent directories. I really don't see a strong argument against having them to do it that doesn't also apply to having to track individual capabilities. But if we really don't want them to do it, which was discussed at the beginning here, we can stop having DynamoDB track it and ensure that it's always done by the FS.

          Show
          mackrorysd Sean Mackrory added a comment - A capabilities-based approach seems to me to be far more complicated than just deciding whether it is or isn't a MetadataStore implementations job to track parent directories. I really don't see a strong argument against having them to do it that doesn't also apply to having to track individual capabilities. But if we really don't want them to do it, which was discussed at the beginning here, we can stop having DynamoDB track it and ensure that it's always done by the FS.
          Hide
          fabbri Aaron Fabbri added a comment -

          Thanks for interesting discussion Sean Mackrory.

          A capabilities-based approach seems to me to be far more complicated

          I think capabilities APIs are good practice for any interfaces that have optional features. They make things explicit and improve testability. IMO FileSystem should have something like this as well (i.e. HADOOP-9565, HDFS-11644 come to mind).

          than just deciding whether it is or isn't a MetadataStore implementations job to track parent directories.

          By "parent directories", I assume you mean ancestor directories all the way to the root. To be clear we have decided what MetadataStore's contract is. create() is the problem here.

          We could change this, adding requirement that for any put(path), the MetadataStore must account for all ancestor directories in the whole path. Currently the contract is not a must but a may. Reasons for this:

          1. runtime complexity. Consider an FS client doing this:

          MetadataStore.put(/a/b/c/d)
          .. put(/a/b/c)
          .. put(/a/b)
          .. put(/a)

          if each put() is does work of writing ancestors, it is wasted (~2x round trips for any DB client, or depth^2 if it is dumb). I want to add more MetadataStore backends in the future, and I'm not sure they will have the same internal requirement that our Dynamo schema does (i.e. that all ancestors must be tracked anyways).

          2. The FS client should be doing the ancestor enumeration anyways (see mkdirs()). Why force all MetadataStore to repeat it?

          MetadataStore is a trailing log of metadata modifications that have been made to a FileSystem. We expect a FS to be able to tell us what is is doing, or it is broken (thus the bug here). I don't think putting a band-aid on create() is a compelling reason to add more requirements to any future MetadataStore implementations. Create is still broken here (i.e. if file is an ancestor) either way.

          TL;DR we could add the requirement if it makes sense, but I don't think it buys us much here. I also feel like the implicit ancestor creation semantics of Hadoop FS was not the best decision and am not eager to have it spread to MetadataStore interface.

          Show
          fabbri Aaron Fabbri added a comment - Thanks for interesting discussion Sean Mackrory . A capabilities-based approach seems to me to be far more complicated I think capabilities APIs are good practice for any interfaces that have optional features. They make things explicit and improve testability. IMO FileSystem should have something like this as well (i.e. HADOOP-9565 , HDFS-11644 come to mind). than just deciding whether it is or isn't a MetadataStore implementations job to track parent directories. By "parent directories", I assume you mean ancestor directories all the way to the root. To be clear we have decided what MetadataStore's contract is. create() is the problem here. We could change this, adding requirement that for any put(path) , the MetadataStore must account for all ancestor directories in the whole path. Currently the contract is not a must but a may . Reasons for this: 1. runtime complexity. Consider an FS client doing this: MetadataStore.put(/a/b/c/d) .. put(/a/b/c) .. put(/a/b) .. put(/a) if each put() is does work of writing ancestors, it is wasted (~2x round trips for any DB client, or depth^2 if it is dumb). I want to add more MetadataStore backends in the future, and I'm not sure they will have the same internal requirement that our Dynamo schema does (i.e. that all ancestors must be tracked anyways). 2. The FS client should be doing the ancestor enumeration anyways (see mkdirs()). Why force all MetadataStore to repeat it? MetadataStore is a trailing log of metadata modifications that have been made to a FileSystem. We expect a FS to be able to tell us what is is doing, or it is broken (thus the bug here). I don't think putting a band-aid on create() is a compelling reason to add more requirements to any future MetadataStore implementations. Create is still broken here (i.e. if file is an ancestor) either way. TL;DR we could add the requirement if it makes sense, but I don't think it buys us much here. I also feel like the implicit ancestor creation semantics of Hadoop FS was not the best decision and am not eager to have it spread to MetadataStore interface.
          Hide
          mackrorysd Sean Mackrory added a comment - - edited

          Had lot of back-and-forth about this offline with Aaron Fabbri. Notes are below. Would like some input from Mingliang Liu / Steve Loughran about this since either solution involves adding to the MetadataStore interface. The gist is this:

          • I think it's reasonable to only do this if authoritative mode is enabled.
          • I'm concerned about code complexity, and having the Dynamo DB implementation add all ancestors, and then replicate the same thing under certain conditions in the FS only for implementations that don't do this. I suspect future implementations are more likely than not to require the tree to be traversable from root like DynamoDB does. I'm actually not clear why Local doesn't have this same problem, unless it's just that we don't care about doing a full-table scan on an in-memory hashmap.
          • Aaron Fabbri is concerned about performance and wants the MetadataStore interface to be as simple as possible to make future implementations easy to add, so having more of the logic live in the FS when possible (but only execute when necessary) is desirable.

          There are 2 options we discussed:

          • Add putRecursive to the metadata store interface and use that in finishedWrite() instead of having it loop up the directory hierarchy itself. For Dynamo or similar implementations, it can just wrap put. Mkdir might be able to use it, although it already traverses the tree making sure none of the parents are a file anyway, similar to what create() should eventually do.
          • Go with the change as-is, but wrap the finishedWrite changes in a capabilities API so that it only adds ancestors if the parent needs them and won't do it itself.

          Personally, I prefer the former. It addresses my concerns about adding recursive operations at multiple points in the code (I think it's more likely than not that any future implementations will have the same need for DynamoDB to have the entire tree be traversable from root anyway), the FS doesn't need to decide whether or not it should recurse because it can always assume the underlying operation will do it the first time it's called, and the one point that operation is implemented can take advantage of any store-specific optimizations rather than something unaware of the storage layer just making repeated calls.

          On the other hand, if we feel like we're going to eventually need a capabilities API anyway, we may as well add it now and use that to solve this problem too.

          Any thoughts, Steve Loughran / Mingliang Liu?

          Any nuances I didn't call out here, Aaron Fabbri?

          Show
          mackrorysd Sean Mackrory added a comment - - edited Had lot of back-and-forth about this offline with Aaron Fabbri . Notes are below. Would like some input from Mingliang Liu / Steve Loughran about this since either solution involves adding to the MetadataStore interface. The gist is this: I think it's reasonable to only do this if authoritative mode is enabled. I'm concerned about code complexity, and having the Dynamo DB implementation add all ancestors, and then replicate the same thing under certain conditions in the FS only for implementations that don't do this. I suspect future implementations are more likely than not to require the tree to be traversable from root like DynamoDB does. I'm actually not clear why Local doesn't have this same problem, unless it's just that we don't care about doing a full-table scan on an in-memory hashmap. Aaron Fabbri is concerned about performance and wants the MetadataStore interface to be as simple as possible to make future implementations easy to add, so having more of the logic live in the FS when possible (but only execute when necessary) is desirable. There are 2 options we discussed: Add putRecursive to the metadata store interface and use that in finishedWrite() instead of having it loop up the directory hierarchy itself. For Dynamo or similar implementations, it can just wrap put. Mkdir might be able to use it, although it already traverses the tree making sure none of the parents are a file anyway, similar to what create() should eventually do. Go with the change as-is, but wrap the finishedWrite changes in a capabilities API so that it only adds ancestors if the parent needs them and won't do it itself. Personally, I prefer the former. It addresses my concerns about adding recursive operations at multiple points in the code (I think it's more likely than not that any future implementations will have the same need for DynamoDB to have the entire tree be traversable from root anyway), the FS doesn't need to decide whether or not it should recurse because it can always assume the underlying operation will do it the first time it's called, and the one point that operation is implemented can take advantage of any store-specific optimizations rather than something unaware of the storage layer just making repeated calls. On the other hand, if we feel like we're going to eventually need a capabilities API anyway, we may as well add it now and use that to solve this problem too. Any thoughts, Steve Loughran / Mingliang Liu ? Any nuances I didn't call out here, Aaron Fabbri ?
          Hide
          liuml07 Mingliang Liu added a comment -

          Nice discussion here! This week I'll be at the DataWorks Summit 2017 San Jose. I'll review later this week or early next week. I don't want to block though, so if you guys reach consensus please go commit. Thanks,

          Show
          liuml07 Mingliang Liu added a comment - Nice discussion here! This week I'll be at the DataWorks Summit 2017 San Jose. I'll review later this week or early next week. I don't want to block though, so if you guys reach consensus please go commit. Thanks,
          Hide
          fabbri Aaron Fabbri added a comment -

          Good summary Sean Mackrory. One thing I'll add about my preference here (which is to commit your patch and add a capabilities check as a performance optimization, pending fixing create()), is the design of the MetadataStore interface. I prefer it to be:

          1. Simple to implement.
          2. Not specific to S3A internals.
          3. Explicit in the way state is mutated (a trailing log of FS metadata changes).

          For #3, I prefer the basic contract to remain that an FS client reports metadata changes it makes explicitly. When create() happens, for example, if the FS client is creating ancestor directories, I'd like to see those creations reported directly via the API, instead of requiring all implementations to deduce them from the path. Besides the design preference, consuming the stream of modifications as they happen also makes things like pruning or TTL expiry possible as we see directory creation when it happens. We have no idea of the age of a directory path, for instance, if we allow the FS to skip reporting its creation.

          In my experience explicit APIs like this are easier to reason about and less likely to hide bugs and allow for general sloppiness.

          Finally, I do not think this case of create() being broken is compelling enough to change the above design that exists today. I also don't feel like it warrants the additional complexity of a putRecursive() method on MetadataStore (which also suffers from the TTL issue). Separation of concerns here, to me, means S3A's broken create() does not infect MetadataStore.

          Show
          fabbri Aaron Fabbri added a comment - Good summary Sean Mackrory . One thing I'll add about my preference here (which is to commit your patch and add a capabilities check as a performance optimization, pending fixing create()), is the design of the MetadataStore interface. I prefer it to be: 1. Simple to implement. 2. Not specific to S3A internals. 3. Explicit in the way state is mutated (a trailing log of FS metadata changes). For #3, I prefer the basic contract to remain that an FS client reports metadata changes it makes explicitly. When create() happens, for example, if the FS client is creating ancestor directories, I'd like to see those creations reported directly via the API, instead of requiring all implementations to deduce them from the path. Besides the design preference, consuming the stream of modifications as they happen also makes things like pruning or TTL expiry possible as we see directory creation when it happens. We have no idea of the age of a directory path, for instance, if we allow the FS to skip reporting its creation. In my experience explicit APIs like this are easier to reason about and less likely to hide bugs and allow for general sloppiness. Finally, I do not think this case of create() being broken is compelling enough to change the above design that exists today. I also don't feel like it warrants the additional complexity of a putRecursive() method on MetadataStore (which also suffers from the TTL issue). Separation of concerns here, to me, means S3A's broken create() does not infect MetadataStore.
          Hide
          fabbri Aaron Fabbri added a comment -

          Sean Mackrory and I just chatted about this and I think we came up with a better option. Instead of adding a "recursive" put to MetadataStore, I suggested a batched put.

          The main thing I disliked about adding a new method MetadataStore#putRecursive(PathMetadata p) is that it is not explicit as to which ancestors are created. It is this sort of sloppy method "I'm creating this thing, make sure the ancestors exist" instead of an explicit "here are the paths that I'm creating". The former makes it hard to use MetadataStore interface for precise metadata logging.

          One of the problems with requiring the FS client to put() each path it creates is that, since dynamo has to create ancestors internally for its schema, DynamoDB ends up having to do extra round trips (both FS and dynamo making sure ancestors are created). Sean Mackrory rightfully pushed back on this. Even with the capability check I suggested (which should be correct but not trivial to reason about), Dynamo would be affected by extra round trips once we added authoritative listing support (HADOOP-14154).

          We both seemed to agree that adding a batched {{MetadataStore#put(Collection<PathMetadata>
          pathsToCreate)}} seemed like a better option than the capability check. The dynamo implementation can use existing logic from its move() implementation to pre-process the list to include ancestors in the batched put list. Dynamo gets optimal round trips, now and in the future, and MetadataStore is still able to express explicit logging of metadata changes.

          Thanks for great discussion on this Sean Mackrory, please add anything I missed.

          Show
          fabbri Aaron Fabbri added a comment - Sean Mackrory and I just chatted about this and I think we came up with a better option. Instead of adding a "recursive" put to MetadataStore, I suggested a batched put. The main thing I disliked about adding a new method MetadataStore#putRecursive(PathMetadata p) is that it is not explicit as to which ancestors are created. It is this sort of sloppy method "I'm creating this thing, make sure the ancestors exist" instead of an explicit "here are the paths that I'm creating". The former makes it hard to use MetadataStore interface for precise metadata logging. One of the problems with requiring the FS client to put() each path it creates is that, since dynamo has to create ancestors internally for its schema, DynamoDB ends up having to do extra round trips (both FS and dynamo making sure ancestors are created). Sean Mackrory rightfully pushed back on this. Even with the capability check I suggested (which should be correct but not trivial to reason about), Dynamo would be affected by extra round trips once we added authoritative listing support ( HADOOP-14154 ). We both seemed to agree that adding a batched {{MetadataStore#put(Collection<PathMetadata> pathsToCreate)}} seemed like a better option than the capability check. The dynamo implementation can use existing logic from its move() implementation to pre-process the list to include ancestors in the batched put list. Dynamo gets optimal round trips, now and in the future, and MetadataStore is still able to express explicit logging of metadata changes. Thanks for great discussion on this Sean Mackrory , please add anything I missed.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          This is becoming an intersting little architectural issue

          • I concur with the "auto-create parent dirs is bad" opinion; In the evolving builder API for new files, HADOOP-14365, I want it to not create parent dirs, so users of the new API can call mkdir first if they really want it.
          • we should consider whether createNonRecursive() is actually lined up right here, or whether it needs changes too. (And tests; created HADOOP-14562 there).
          • we need to have an efficient codepath for the mkdirs/create sequence. One thing to consider here: if done correctly, in auth mode, the existence of a parent is sufficient to guarantee the existence of all its parents. Therefore you will only need to go up the tree for all nonexistent parents.
          • ...and line up for that builder. So maybe the core inner create operation would take flags indicating parent dir policy, if create: walk up creating parents. If not: fail fast if parent != dir.
          • As noted, multiple round trips are bad...not just in perf but in billable IOPs.
          • and w.r.t capability checks, static interface capabilities cause problems in subclasses, witness Syncable. Some dynamic probe is the only solution agile enough to cope, IMO.

          So: how about we optimise for the create-parent-must-exist and create-parent-probably-exists use cases, and treat the other one (no parent tree) as an outlier we can make expensive?

          Show
          stevel@apache.org Steve Loughran added a comment - This is becoming an intersting little architectural issue I concur with the "auto-create parent dirs is bad" opinion; In the evolving builder API for new files, HADOOP-14365 , I want it to not create parent dirs, so users of the new API can call mkdir first if they really want it. we should consider whether createNonRecursive() is actually lined up right here, or whether it needs changes too. (And tests; created HADOOP-14562 there). we need to have an efficient codepath for the mkdirs/create sequence. One thing to consider here: if done correctly, in auth mode, the existence of a parent is sufficient to guarantee the existence of all its parents. Therefore you will only need to go up the tree for all nonexistent parents. ...and line up for that builder. So maybe the core inner create operation would take flags indicating parent dir policy, if create: walk up creating parents. If not: fail fast if parent != dir. As noted, multiple round trips are bad...not just in perf but in billable IOPs. and w.r.t capability checks, static interface capabilities cause problems in subclasses, witness Syncable . Some dynamic probe is the only solution agile enough to cope, IMO. So: how about we optimise for the create-parent-must-exist and create-parent-probably-exists use cases, and treat the other one (no parent tree) as an outlier we can make expensive?
          Hide
          mackrorysd Sean Mackrory added a comment -

          So attaching a patch that follows the design discussed by Aaron Fabbri and I in our last round of comments. This still requires that clients pass in the full list of directories that need to be created, and still has DynamoDB enforce that itself by ensuring that that set also includes the parents of everything in the set. Note that it does this in-memory, without regard for what already may exist in the database. It then writes it all in one batch. This is safe for now, since directories either exist or they don't and there isn't other metadata we store with them, but if we do start storing metadata (like authoritative bits) on each directory, I see no other option but to go back to doing it 1 level at a time, probably with a round-trip each (unless we can somehow request the entire lineage at once, but I don't see how we could do that either). That would still line up with Steve Loughran's point about being a single round-trip when the parent does exist, and only becoming excessively slow when it doesn't.

          In addition to my usual S3N failures, I started seeing the following failures intermittently today - I would see these perhaps 1-in-4 test runs. So far only with DynamoDB (with and without authoritative mode), but with it happening rarely I can't be sure it's DynamoDB-specific. I reverted my change but continue to see the failures:

          testConsistentRenameAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency)  Time elapsed: 11.925 sec  <<< FAILURE!
          java.lang.AssertionError: Recently renamed dir should not be visible
                  at org.junit.Assert.fail(Assert.java:88)
                  at org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentRenameAfterDelete(ITestS3GuardListConsistency.java:237)
          
          testConsistentListAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency)  Time elapsed: 7.383 sec  <<< FAILURE!
          java.lang.AssertionError: null
                  at org.junit.Assert.fail(Assert.java:86)
                  at org.junit.Assert.assertTrue(Assert.java:41)
                  at org.junit.Assert.assertFalse(Assert.java:64)
                  at org.junit.Assert.assertFalse(Assert.java:74)
                  at org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentListAfterDelete(ITestS3GuardListConsistency.java:191)
          
          Show
          mackrorysd Sean Mackrory added a comment - So attaching a patch that follows the design discussed by Aaron Fabbri and I in our last round of comments. This still requires that clients pass in the full list of directories that need to be created, and still has DynamoDB enforce that itself by ensuring that that set also includes the parents of everything in the set. Note that it does this in-memory, without regard for what already may exist in the database. It then writes it all in one batch. This is safe for now, since directories either exist or they don't and there isn't other metadata we store with them, but if we do start storing metadata (like authoritative bits) on each directory, I see no other option but to go back to doing it 1 level at a time, probably with a round-trip each (unless we can somehow request the entire lineage at once, but I don't see how we could do that either). That would still line up with Steve Loughran 's point about being a single round-trip when the parent does exist, and only becoming excessively slow when it doesn't. In addition to my usual S3N failures, I started seeing the following failures intermittently today - I would see these perhaps 1-in-4 test runs. So far only with DynamoDB (with and without authoritative mode), but with it happening rarely I can't be sure it's DynamoDB-specific. I reverted my change but continue to see the failures: testConsistentRenameAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency) Time elapsed: 11.925 sec <<< FAILURE! java.lang.AssertionError: Recently renamed dir should not be visible at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentRenameAfterDelete(ITestS3GuardListConsistency.java:237) testConsistentListAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency) Time elapsed: 7.383 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentListAfterDelete(ITestS3GuardListConsistency.java:191)
          Hide
          hadoopqa Hadoop QA added a comment -
          1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 9s Docker mode activated.
          1 @author 0m 0s The patch does not contain any @author tags.
          1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          1 mvninstall 17m 30s HADOOP-13345 passed
          1 compile 0m 22s HADOOP-13345 passed
          1 checkstyle 0m 15s HADOOP-13345 passed
          1 mvnsite 0m 25s HADOOP-13345 passed
          1 findbugs 0m 36s HADOOP-13345 passed
          1 javadoc 0m 16s HADOOP-13345 passed
          1 mvninstall 0m 25s the patch passed
          1 compile 0m 20s the patch passed
          1 javac 0m 20s the patch passed
          -0 checkstyle 0m 14s hadoop-tools/hadoop-aws: The patch generated 19 new 20 unchanged - 0 fixed = 39 total (was 20)
          1 mvnsite 0m 24s the patch passed
          1 whitespace 0m 0s The patch has no whitespace issues.
          1 findbugs 0m 42s the patch passed
          1 javadoc 0m 14s the patch passed
          1 unit 0m 45s hadoop-aws in the patch passed.
          1 asflicense 0m 20s The patch does not generate ASF License warnings.
          27m 19s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14457
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876015/HADOOP-14457-HADOOP-13345.010.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 539573c05fba 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision HADOOP-13345 / 309b8c0
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - 1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. 1 @author 0m 0s The patch does not contain any @author tags. 1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 1 mvninstall 17m 30s HADOOP-13345 passed 1 compile 0m 22s HADOOP-13345 passed 1 checkstyle 0m 15s HADOOP-13345 passed 1 mvnsite 0m 25s HADOOP-13345 passed 1 findbugs 0m 36s HADOOP-13345 passed 1 javadoc 0m 16s HADOOP-13345 passed 1 mvninstall 0m 25s the patch passed 1 compile 0m 20s the patch passed 1 javac 0m 20s the patch passed -0 checkstyle 0m 14s hadoop-tools/hadoop-aws: The patch generated 19 new 20 unchanged - 0 fixed = 39 total (was 20) 1 mvnsite 0m 24s the patch passed 1 whitespace 0m 0s The patch has no whitespace issues. 1 findbugs 0m 42s the patch passed 1 javadoc 0m 14s the patch passed 1 unit 0m 45s hadoop-aws in the patch passed. 1 asflicense 0m 20s The patch does not generate ASF License warnings. 27m 19s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14457 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876015/HADOOP-14457-HADOOP-13345.010.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 539573c05fba 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 309b8c0 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          LGTM, tested locally too.

          +1 committed

          It'll have been missed unless people were up very early, but I amend and force update my first commit, with a window of < 5 minutes. The second patch fixed those indentation warnings in checkstyle by moving the clause in question two spaces to the left.

          We really need contract tests to push the corners of legit creation; I'll do something quick there

          Show
          stevel@apache.org Steve Loughran added a comment - LGTM, tested locally too. +1 committed It'll have been missed unless people were up very early, but I amend and force update my first commit, with a window of < 5 minutes. The second patch fixed those indentation warnings in checkstyle by moving the clause in question two spaces to the left. We really need contract tests to push the corners of legit creation; I'll do something quick there
          Hide
          fabbri Aaron Fabbri added a comment -

          Thanks for the patch Sean Mackrory. This looks really good.

          Couple comments

          @@ -251,9 +252,10 @@ public static boolean isNullMetadataStore(MetadataStore ms) {
              *              an empty, dir, and the other dirs only contain their child
              *              dir.
              * @param owner Hadoop user name.
          +   * @param authoritative Whether to mark new directories as authoritative.
              */
             public static void makeDirsOrdered(MetadataStore ms, List<Path> dirs,
          -      String owner) {
          +      String owner, boolean authoritative) {
               if (dirs == null) {
                 return;
               }
          @@ -286,7 +288,7 @@ public static void makeDirsOrdered(MetadataStore ms, List<Path> dirs,
                   children.add(new PathMetadata(prevStatus));
                 }
                 DirListingMetadata dirMeta =
          -          new DirListingMetadata(f, children, true);
          +          new DirListingMetadata(f, children, authoritative);
                 try {
                   ms.put(dirMeta);
                   ms.put(new PathMetadata(status));
          

          IIRC, the only reason I put the DirListingMetadata in addition to the PathMetadata here, was to mark the directory as authoritative. (In mkdirs case, we actually know we are creating the ancestors, so we can mark them as fully cached, as we know that contain exactly one entry; the next child in the path).

          I think we can elide the ms.put(dirMeta) here (and associated creation of DirListingMetadata) when authoritative is false. That cuts round trips in half.

          We could also use the new batched put() here, which would speed things up even more.

          +  public static void addAncestors(MetadataStore metadataStore,
          +      Path qualifiedPath, String username) throws IOException {
          +    Collection<PathMetadata> newDirs = new ArrayList<>();
          +    Path parent = qualifiedPath.getParent();
          +    while (!parent.isRoot()) {
          +      PathMetadata directory = metadataStore.get(parent);
          +      if (directory == null || directory.isDeleted()) {
          +        FileStatus status = new FileStatus(0, true, 1, 0, 0, 0, null, username,
          +            null, parent);
          +        PathMetadata meta = new PathMetadata(status, Tristate.UNKNOWN, false);
          

          Being a little pedantic: This can be Tristate.FALSE: we know these are not empty dirs since they are ancestors of other files.

          +        newDirs.add(meta);
          +      } else {
          +        break;
          +      }
          +      parent = parent.getParent();
          +    }
          +    metadataStore.put(newDirs);
          +  }
          
          Show
          fabbri Aaron Fabbri added a comment - Thanks for the patch Sean Mackrory . This looks really good. Couple comments @@ -251,9 +252,10 @@ public static boolean isNullMetadataStore(MetadataStore ms) { * an empty, dir, and the other dirs only contain their child * dir. * @param owner Hadoop user name. + * @param authoritative Whether to mark new directories as authoritative. */ public static void makeDirsOrdered(MetadataStore ms, List<Path> dirs, - String owner) { + String owner, boolean authoritative) { if (dirs == null) { return; } @@ -286,7 +288,7 @@ public static void makeDirsOrdered(MetadataStore ms, List<Path> dirs, children.add(new PathMetadata(prevStatus)); } DirListingMetadata dirMeta = - new DirListingMetadata(f, children, true); + new DirListingMetadata(f, children, authoritative); try { ms.put(dirMeta); ms.put(new PathMetadata(status)); IIRC, the only reason I put the DirListingMetadata in addition to the PathMetadata here, was to mark the directory as authoritative. (In mkdirs case, we actually know we are creating the ancestors, so we can mark them as fully cached, as we know that contain exactly one entry; the next child in the path). I think we can elide the ms.put(dirMeta) here (and associated creation of DirListingMetadata) when authoritative is false. That cuts round trips in half. We could also use the new batched put() here, which would speed things up even more. + public static void addAncestors(MetadataStore metadataStore, + Path qualifiedPath, String username) throws IOException { + Collection<PathMetadata> newDirs = new ArrayList<>(); + Path parent = qualifiedPath.getParent(); + while (!parent.isRoot()) { + PathMetadata directory = metadataStore.get(parent); + if (directory == null || directory.isDeleted()) { + FileStatus status = new FileStatus(0, true, 1, 0, 0, 0, null, username, + null, parent); + PathMetadata meta = new PathMetadata(status, Tristate.UNKNOWN, false); Being a little pedantic: This can be Tristate.FALSE : we know these are not empty dirs since they are ancestors of other files. + newDirs.add(meta); + } else { + break; + } + parent = parent.getParent(); + } + metadataStore.put(newDirs); + }
          Hide
          fabbri Aaron Fabbri added a comment -

          I see this was already committed, so I filed HADOOP-14633 for follow-up work.

          Thanks for the excellent work here Sean Mackrory and thanks for review and testing Steve Loughran.

          Show
          fabbri Aaron Fabbri added a comment - I see this was already committed, so I filed HADOOP-14633 for follow-up work. Thanks for the excellent work here Sean Mackrory and thanks for review and testing Steve Loughran .
          Hide
          fabbri Aaron Fabbri added a comment -

          FYI did some perf tests.. patch and results posted to HADOOP-14633

          Show
          fabbri Aaron Fabbri added a comment - FYI did some perf tests.. patch and results posted to HADOOP-14633

            People

            • Assignee:
              mackrorysd Sean Mackrory
              Reporter:
              mackrorysd Sean Mackrory
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development