Hadoop Common
  1. Hadoop Common
  2. HADOOP-5438

Merge FileSystem.create and FileSystem.append

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change

      Description

      Currently, when a user wants to modify a file, the user first calls exists() to know if this file is already there. And then uses create() or append() according to whether the file exists or not.
      the code looks like:

      FSDataOutputStream out_1 = null;
      if (fs.exists(path_1))
         out_1 = fs.append(path_1);
      else
         out_1 = fs.create(path_1);
      

      . On the performace side,It involes two RPCs. On the easy-of-use side, it is not very convient in contrast to the traditional open interface.

      It will more complicate if there is a overwrite parameter specified. I donot know whether there is a bug about 'overwrite' in 0.19, some times it takes a long time for overwrite creates to reture. So i make the write file code with overwrite param works like:

      boolean exists = fs.exists(name);
      if (overwrite) {
          if (exists)
             fs.delete(name, true);
           this.out = fs.create(name, overwrite, bufferSize, replication,
      				    blockSize, progress);
           this.currentRowID = 0;
       } else {
         if (!exists)
      	this.out = fs.create(name, overwrite, bufferSize,
      					replication, blockSize, progress);
         else
      	this.out = fs.append(name, bufferSize, progress);
      

      Some code statements there are really redundant and not needed, especialy with the delete(). But without deleting first, the overwrite takes a long time to reture.

      BTW, i will create another issue about the overwrite problem. If it is not a bug at all or a duplicate, someone please close it.

      1. Hadoop-5438-2009-03-30.patch
        31 kB
        He Yongqiang
      2. Hadoop-5438-2009-03-31.patch
        34 kB
        He Yongqiang
      3. Hadoop-5438-2009-03-31-2.patch
        34 kB
        He Yongqiang
      4. Hadoop-5438(2009-04-06).patch
        39 kB
        He Yongqiang
      5. Hadoop-5438-2009-05-5.patch
        39 kB
        He Yongqiang
      6. Hadoop-5438-2009-05-10.patch
        39 kB
        He Yongqiang
      7. Hadoop-5438-2009-05-15.patch
        34 kB
        He Yongqiang
      8. Hadoop-5438-2009-05-19.patch
        34 kB
        He Yongqiang

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          > file a jira for 21 and 22 to unmerge create() and append()

          I've filed HADOOP-6826 for this.

          > file a jira for 22 to come up with a new API on trunk that we're comfortable releasing (eg the better create method He described above or having two RPCs only in the case where you try to append to a file that doesn't exist).

          FileContext has a create method that supports the case described in this JIRA - is this sufficient?

          Show
          Tom White added a comment - > file a jira for 21 and 22 to unmerge create() and append() I've filed HADOOP-6826 for this. > file a jira for 22 to come up with a new API on trunk that we're comfortable releasing (eg the better create method He described above or having two RPCs only in the case where you try to append to a file that doesn't exist). FileContext has a create method that supports the case described in this JIRA - is this sufficient?
          Hide
          Eli Collins added a comment -

          Since this a common file system API change we should probably discuss in a HADOOP jira. How about we:

          • file a jira for 21 and 22 to unmerge create() and append()
          • file a jira for 22 to come up with a new API on trunk that we're comfortable releasing (eg the better create method He described above or having two RPCs only in the case where you try to append to a file that doesn't exist).
          Show
          Eli Collins added a comment - Since this a common file system API change we should probably discuss in a HADOOP jira. How about we: file a jira for 21 and 22 to unmerge create() and append() file a jira for 22 to come up with a new API on trunk that we're comfortable releasing (eg the better create method He described above or having two RPCs only in the case where you try to append to a file that doesn't exist).
          Hide
          Hairong Kuang added a comment -

          Sure, let's continue discussion at HDFS-609.

          Show
          Hairong Kuang added a comment - Sure, let's continue discussion at HDFS-609 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > 1) this is used in two of my previous projects, and get deployed. i don't want to go back to change things (actually i have no right to do that now).

          I am sorry that your projects have already used it. However, your projects are using a development version, which is supposed to be unstable.

          More importantly, if this is not a good API, we definitely don't want to put it on a release. Otherwise, it is much harder to fix the problem.

          Since this and the related patches were committed for more than a year, I thinks it is impossible to revert it. I suggest we stop the discussion here and discuss this on a new issue in order to fix the problems.

          Show
          Tsz Wo Nicholas Sze added a comment - > 1) this is used in two of my previous projects, and get deployed. i don't want to go back to change things (actually i have no right to do that now). I am sorry that your projects have already used it. However, your projects are using a development version, which is supposed to be unstable. More importantly, if this is not a good API, we definitely don't want to put it on a release. Otherwise, it is much harder to fix the problem. Since this and the related patches were committed for more than a year, I thinks it is impossible to revert it. I suggest we stop the discussion here and discuss this on a new issue in order to fix the problems.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          This changed ClientProtocol, making as an incompatible change.

          Show
          Tsz Wo Nicholas Sze added a comment - This changed ClientProtocol, making as an incompatible change.
          Hide
          He Yongqiang added a comment -

          1) this is used in two of my previous projects, and get deployed. i don't want to go back to change things (actually i have no right to do that now).
          2) i did not see any benefit here.

          Show
          He Yongqiang added a comment - 1) this is used in two of my previous projects, and get deployed. i don't want to go back to change things (actually i have no right to do that now). 2) i did not see any benefit here.
          Hide
          Jakob Homan added a comment -

          I would rather get this right than have to support an incorrect, deprecated version for two releases. +1 on revert.

          Show
          Jakob Homan added a comment - I would rather get this right than have to support an incorrect, deprecated version for two releases. +1 on revert.
          Hide
          Hairong Kuang added a comment -

          > Here is not about append and create. it is also create and overwrite.
          Reverting the patch does not remove the "overwrite" functionality. If we only need to support create and overwrite, a boolean flag is good enough as we have pre this patch. I do not see the need of introducing an enum.

          Yongqiang, could you please explain if there is anything beyond which makes you against the idea of reverting this patch?

          Show
          Hairong Kuang added a comment - > Here is not about append and create. it is also create and overwrite. Reverting the patch does not remove the "overwrite" functionality. If we only need to support create and overwrite, a boolean flag is good enough as we have pre this patch. I do not see the need of introducing an enum. Yongqiang, could you please explain if there is anything beyond which makes you against the idea of reverting this patch?
          Hide
          He Yongqiang added a comment -

          i am -1 to revert.

          Show
          He Yongqiang added a comment - i am -1 to revert.
          Hide
          Tom White added a comment -

          I think this needs more work to find an API that there is broad agreement on. It should be undertaken as a larger piece of work to consolidate the create API (this would include HADOOP-3252 as Jakob pointed out), including deprecating old methods.

          This change doesn't enable anything that wasn't previously possible, does it? In other words, backing it out wouldn't remove functionality for users of released versions of Hadoop. For example, append and overwrite operations are possible with the existing API. Since it hasn't been released yet, I'm +1 to reverting this while the create API gets more discussion and testing (especially with HDFS append).

          Show
          Tom White added a comment - I think this needs more work to find an API that there is broad agreement on. It should be undertaken as a larger piece of work to consolidate the create API (this would include HADOOP-3252 as Jakob pointed out), including deprecating old methods. This change doesn't enable anything that wasn't previously possible, does it? In other words, backing it out wouldn't remove functionality for users of released versions of Hadoop. For example, append and overwrite operations are possible with the existing API. Since it hasn't been released yet, I'm +1 to reverting this while the create API gets more discussion and testing (especially with HDFS append).
          Hide
          Eli Collins added a comment -

          Does it make sense to do a release with the current API, ie is the merge of the APIs ready enough to release? From the last thread of comments it sounds like it's not clear what API people want (eg vararg create flags).

          Show
          Eli Collins added a comment - Does it make sense to do a release with the current API, ie is the merge of the APIs ready enough to release? From the last thread of comments it sounds like it's not clear what API people want (eg vararg create flags).
          Hide
          He Yongqiang added a comment -

          Here is not about append and create. it is also create and overwrite.

          Show
          He Yongqiang added a comment - Here is not about append and create. it is also create and overwrite.
          Hide
          Eli Collins added a comment -

          HDFS-609 (create with the append flag doesn't work) is marked as a blocker for 21. Since the combined create/append API does not work and there's still some debate as to what the right API is it seems like the right thing to do here is revert this change for 21 (and on trunk) and figure out the right API before we put it in an official release. Reasonable?

          Show
          Eli Collins added a comment - HDFS-609 (create with the append flag doesn't work) is marked as a blocker for 21. Since the combined create/append API does not work and there's still some debate as to what the right API is it seems like the right thing to do here is revert this change for 21 (and on trunk) and figure out the right API before we put it in an official release. Reasonable?
          Hide
          He Yongqiang added a comment -

          If there weren't so many different create methods, I would suggest having one each for just the Enum value and and equivalent one that takes an enumset, so that if you just wanted to overwrite you could just use the one that takes the enum, and the convenience method would jam it into an enumset. It's too bad Java won't automatically convert a single enum into an enumset when necessary.

          Agree. It will be better we can provide a create method which takes one CreateFlag enum value and equivalent one that takes an enumset (i think it's already there). But how about the other already existing create methods? Remove them? And this will be a incompatible change.

          >>We could use a variable arity parameter, e.g.: FSDataOutputStream create(Path file, CreateFlag... flags);
          This is a very good suggestion. some programmer(like me) does not use EnumSet.of() very often. But the situation can be mitigated if we provide a method that accept one CreateFlag parameter.

          Show
          He Yongqiang added a comment - If there weren't so many different create methods, I would suggest having one each for just the Enum value and and equivalent one that takes an enumset, so that if you just wanted to overwrite you could just use the one that takes the enum, and the convenience method would jam it into an enumset. It's too bad Java won't automatically convert a single enum into an enumset when necessary. Agree. It will be better we can provide a create method which takes one CreateFlag enum value and equivalent one that takes an enumset (i think it's already there). But how about the other already existing create methods? Remove them? And this will be a incompatible change. >>We could use a variable arity parameter, e.g.: FSDataOutputStream create(Path file, CreateFlag... flags); This is a very good suggestion. some programmer(like me) does not use EnumSet.of() very often. But the situation can be mitigated if we provide a method that accept one CreateFlag parameter.
          Hide
          Doug Cutting added a comment -

          > It's too bad Java won't automatically convert a single enum into an enumset when necessary.

          We could use a variable arity parameter, e.g.:
          FSDataOutputStream create(Path file, CreateFlag... flags);

          Show
          Doug Cutting added a comment - > It's too bad Java won't automatically convert a single enum into an enumset when necessary. We could use a variable arity parameter, e.g.: FSDataOutputStream create(Path file, CreateFlag... flags);
          Hide
          Jakob Homan added a comment -

          So, as committed there are now six create methods that take a boolean value as to whether or not overwrite (and converts them to an enumset) and one create method that takes an EnumSet. HADOOP-3252 will definitely help with this, but this still seems odd and counterintuitive. If we have the CreateFlag enum, we should use it everywhere. Otherwise the methods are a bit confusing - a user will sit there going 'why are there so many create methods and why is there is this odd one out - isn't overwrite boolean the same as the createflag enum?'

          If there weren't so many different create methods, I would suggest having one each for just the Enum value and and equivalent one that takes an enumset, so that if you just wanted to overwrite you could just use the one that takes the enum, and the convenience method would jam it into an enumset. It's too bad Java won't automatically convert a single enum into an enumset when necessary.

          Show
          Jakob Homan added a comment - So, as committed there are now six create methods that take a boolean value as to whether or not overwrite (and converts them to an enumset) and one create method that takes an EnumSet. HADOOP-3252 will definitely help with this, but this still seems odd and counterintuitive. If we have the CreateFlag enum, we should use it everywhere. Otherwise the methods are a bit confusing - a user will sit there going 'why are there so many create methods and why is there is this odd one out - isn't overwrite boolean the same as the createflag enum?' If there weren't so many different create methods, I would suggest having one each for just the Enum value and and equivalent one that takes an enumset, so that if you just wanted to overwrite you could just use the one that takes the enum, and the convenience method would jam it into an enumset. It's too bad Java won't automatically convert a single enum into an enumset when necessary.
          Hide
          He Yongqiang added a comment -

          np. Filed a jira HADOOP-6138.

          Show
          He Yongqiang added a comment - np. Filed a jira HADOOP-6138 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Yongqiang, I fixed HDFS-440. But the deprecated warnings are still there. It would be great if you can fix them.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Yongqiang, I fixed HDFS-440 . But the deprecated warnings are still there. It would be great if you can fix them.
          Hide
          He Yongqiang added a comment -

          does these two problems be resolved? If not, i will open another jira and resolve it.

          Show
          He Yongqiang added a comment - does these two problems be resolved? If not, i will open another jira and resolve it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch introduces two problems:

          • javadoc warnings: broken links (HDFS-440)
          • deprecated warnings: some codes in truck still use the old API, which were deprecated by the patch.
          Show
          Tsz Wo Nicholas Sze added a comment - The patch introduces two problems: javadoc warnings: broken links ( HDFS-440 ) deprecated warnings: some codes in truck still use the old API, which were deprecated by the patch.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #863 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/ )
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks He!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks He!
          Hide
          dhruba borthakur added a comment -

          +1 Code looks good. I will commit it tomorrow.

          Show
          dhruba borthakur added a comment - +1 Code looks good. I will commit it tomorrow.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408434/Hadoop-5438-2009-05-19.patch
          against trunk revision 777019.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408434/Hadoop-5438-2009-05-19.patch against trunk revision 777019. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/370/console This message is automatically generated.
          Hide
          He Yongqiang added a comment -

          The earlier one can not be applied to truck code again.
          Hope this time when the patch is run in Hudson, it can be merged to truck.

          Show
          He Yongqiang added a comment - The earlier one can not be applied to truck code again. Hope this time when the patch is run in Hudson, it can be merged to truck.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408214/Hadoop-5438-2009-05-15.patch
          against trunk revision 776148.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/353/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408214/Hadoop-5438-2009-05-15.patch against trunk revision 776148. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/353/console This message is automatically generated.
          Hide
          He Yongqiang added a comment -

          A new patch againt truck code.
          Thanks, dhruba.

          Show
          He Yongqiang added a comment - A new patch againt truck code. Thanks, dhruba.
          Hide
          dhruba borthakur added a comment -

          This patch does not merge with trunk. Can you pl resubmit this patch? Thanks.

          Show
          dhruba borthakur added a comment - This patch does not merge with trunk. Can you pl resubmit this patch? Thanks.
          Hide
          He Yongqiang added a comment -

          The failed contrib test is not related to this patch.

          Show
          He Yongqiang added a comment - The failed contrib test is not related to this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407719/Hadoop-5438-2009-05-10.patch
          against trunk revision 772960.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407719/Hadoop-5438-2009-05-10.patch against trunk revision 772960. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/321/console This message is automatically generated.
          Hide
          He Yongqiang added a comment -

          The new patch file against the truck code

          Show
          He Yongqiang added a comment - The new patch file against the truck code
          Hide
          He Yongqiang added a comment -

          It seems the code has changed a lot and the patch can not be applied. I will submit a new patch againt the truck code.

          Show
          He Yongqiang added a comment - It seems the code has changed a lot and the patch can not be applied. I will submit a new patch againt the truck code.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407209/Hadoop-5438-2009-05-5.patch
          against trunk revision 772960.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/301/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407209/Hadoop-5438-2009-05-5.patch against trunk revision 772960. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/301/console This message is automatically generated.
          Hide
          He Yongqiang added a comment -

          H-5596 is committed by Nicholas.
          Hadoop-5438-2009-05-5.patch is the corresponding patch relying on H-5596.

          Show
          He Yongqiang added a comment - H-5596 is committed by Nicholas. Hadoop-5438-2009-05-5.patch is the corresponding patch relying on H-5596.
          Hide
          dhruba borthakur added a comment -

          I would vote for closing HADOOP-5596 and mark it as related to HADOOP-5438.

          Show
          dhruba borthakur added a comment - I would vote for closing HADOOP-5596 and mark it as related to HADOOP-5438 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          As we discussed in HADOOP-5596, we should not change ObjectWritable for supporting EnumSet. We should define a new class EnumSetWritable. We may declare FileSystem.create(..) with EnumSet and ClientProtocol.create(..) with EnumSetWritable.

          Show
          Tsz Wo Nicholas Sze added a comment - As we discussed in HADOOP-5596 , we should not change ObjectWritable for supporting EnumSet. We should define a new class EnumSetWritable. We may declare FileSystem.create(..) with EnumSet and ClientProtocol.create(..) with EnumSetWritable.
          Hide
          He Yongqiang added a comment -

          yeah, the import statements changes in DFSClient.java is because that some import .....* is unfolded and replaced with the exact classes in that package.
          yes, we should add some enhancements with combination of OVERWRITE flags, more tests are always better .

          btw, it seems Hadoop-5438(2009-04-06).patch already contains the patch i made for H-5596, so should i close H-5596 or remove the H5596 code in Hadoop-5438(2009-04-06).patch?

          Show
          He Yongqiang added a comment - yeah, the import statements changes in DFSClient.java is because that some import .....* is unfolded and replaced with the exact classes in that package. yes, we should add some enhancements with combination of OVERWRITE flags, more tests are always better . btw, it seems Hadoop-5438(2009-04-06).patch already contains the patch i made for H-5596, so should i close H-5596 or remove the H5596 code in Hadoop-5438(2009-04-06).patch?
          Hide
          dhruba borthakur added a comment -

          This s continuing to look better now. What does folks think about completely deprecating FileSystem.append?

          There seems to be lots of changes in the "import ..." statements for DFSClient.java, is this a code cleanup?

          Does the unit test need to be enhanced to use a combination of flags along with the OVERWRITE flag?

          Show
          dhruba borthakur added a comment - This s continuing to look better now. What does folks think about completely deprecating FileSystem.append? There seems to be lots of changes in the "import ..." statements for DFSClient.java, is this a code cleanup? Does the unit test need to be enhanced to use a combination of flags along with the OVERWRITE flag?
          Hide
          He Yongqiang added a comment -

          Hadoop-5438(2009-04-06).patch added in javadoc and apache license for CreateFlag.
          Summary:
          1. added an Enum Type called CreateFlag for specifiying the semantic of create()
          2. currently CreateFlag only includes: OVERWRITE, APPEND and CREATE
          3. Semantics:
          (1)CREATE: create the file if it does not exist, throw an exception if it already existes
          (2)OVERWRITE: overwrite a file if it already exists and create it if it does not exist
          (3)APPEND: append to the file if it existes, throw an exception if it does not exists
          4. combine OVERWRITE with either CREATE or APPEND does the same as only use OVERWRITE
          5. combine CREATE and APPEND has the semantic: create a file if it does not exist, and append to it if it already existes
          6. users can comibine mode with: EnumSet.of(CreateFlag, CreateFlag...).
          7. Because the current RPC system can not serialize EnumSet. This issue depends on H-5596, which adds support for serializing EnumSet

          Thanks,dhruba borthakur, Konstantin Shvachko and Jakob Homan. All your suggestions are really helpful !

          Show
          He Yongqiang added a comment - Hadoop-5438(2009-04-06).patch added in javadoc and apache license for CreateFlag. Summary: 1. added an Enum Type called CreateFlag for specifiying the semantic of create() 2. currently CreateFlag only includes: OVERWRITE, APPEND and CREATE 3. Semantics: (1)CREATE: create the file if it does not exist, throw an exception if it already existes (2)OVERWRITE: overwrite a file if it already exists and create it if it does not exist (3)APPEND: append to the file if it existes, throw an exception if it does not exists 4. combine OVERWRITE with either CREATE or APPEND does the same as only use OVERWRITE 5. combine CREATE and APPEND has the semantic: create a file if it does not exist, and append to it if it already existes 6. users can comibine mode with: EnumSet.of(CreateFlag, CreateFlag...). 7. Because the current RPC system can not serialize EnumSet. This issue depends on H-5596, which adds support for serializing EnumSet Thanks,dhruba borthakur, Konstantin Shvachko and Jakob Homan. All your suggestions are really helpful !
          Hide
          He Yongqiang added a comment -

          This issue is now blocked by Hadoop-5596. Hadoop-5596 is to make ObjectWritable support the EnumSet data type. EnumSet is used in the newly added NameNode's create() interface.

          Show
          He Yongqiang added a comment - This issue is now blocked by Hadoop-5596. Hadoop-5596 is to make ObjectWritable support the EnumSet data type. EnumSet is used in the newly added NameNode's create() interface.
          Hide
          He Yongqiang added a comment -

          Attched Hadoop-5438-2009-03-31-2.patch which passes the local TestFileCreation test with applying hadoop-5596.

          Show
          He Yongqiang added a comment - Attched Hadoop-5438-2009-03-31-2.patch which passes the local TestFileCreation test with applying hadoop-5596.
          Hide
          He Yongqiang added a comment -

          1) attach an update version according to dhruba's suggestions. Thanks, dhruba.
          2) create HADOOP-5596 for the EnumSet problem. I have tried to add the EnumSet support in ObjectWritable, it seems will not need much work to do.

          Show
          He Yongqiang added a comment - 1) attach an update version according to dhruba's suggestions. Thanks, dhruba. 2) create HADOOP-5596 for the EnumSet problem. I have tried to add the EnumSet support in ObjectWritable, it seems will not need much work to do.
          Hide
          He Yongqiang added a comment -

          Thanks, dhruba.
          1. done
          2. done (the previous patch was like what you suggeted. )
          3. done

          Before i attach the new patch, there is one serious problem. The patch can not pass the test.
          It seems EnumSet can not be serialized by ObjectWritable. Any ideas?

          Show
          He Yongqiang added a comment - Thanks, dhruba. 1. done 2. done (the previous patch was like what you suggeted. ) 3. done Before i attach the new patch, there is one serious problem. The patch can not pass the test. It seems EnumSet can not be serialized by ObjectWritable. Any ideas?
          Hide
          dhruba borthakur added a comment -

          1. You have to bump out the versionID number in ClientProtocol.
          2. Please leave the original public method signatures in FileSystem.java as they are (deprecate them). This is to ensure that existing apps do not break. Then create a new create method signature and make the old methods invoke the new create method internally.
          3. And, of course, we would like a unit test that invokes the new create method with all supported flags. One option would be to add a new method to TestFileCreation.java.

          Show
          dhruba borthakur added a comment - 1. You have to bump out the versionID number in ClientProtocol. 2. Please leave the original public method signatures in FileSystem.java as they are (deprecate them). This is to ensure that existing apps do not break. Then create a new create method signature and make the old methods invoke the new create method internally. 3. And, of course, we would like a unit test that invokes the new create method with all supported flags. One option would be to add a new method to TestFileCreation.java.
          Hide
          He Yongqiang added a comment -

          A quick fix for this issue. It uses Enum for CreateFlag.
          All implemented FileSystems are modified.

          Show
          He Yongqiang added a comment - A quick fix for this issue. It uses Enum for CreateFlag. All implemented FileSystems are modified.
          Hide
          He Yongqiang added a comment -

          Will Users feel comfortable with the verbose createflag operations? I think It would better if we made the CreateFlag a static class, with which users can perform | operation, like CreateFlag.CREATE|CreateFlag.APPEND. It would be much like the code in my initial propose.
          so far, we have all focused on the outside user interface. I think it is the most important part, and once it is settled, this issue can be fixed quickly.
          Will make it an enum be better than a class? I mean not everyone is familiar with the java EnumSet (at least me ).

          Show
          He Yongqiang added a comment - Will Users feel comfortable with the verbose createflag operations? I think It would better if we made the CreateFlag a static class, with which users can perform | operation, like CreateFlag.CREATE|CreateFlag.APPEND. It would be much like the code in my initial propose. so far, we have all focused on the outside user interface. I think it is the most important part, and once it is settled, this issue can be fixed quickly. Will make it an enum be better than a class? I mean not everyone is familiar with the java EnumSet (at least me ).
          Hide
          Jakob Homan added a comment -

          The reason that i have not assigned a short mode to each flag is that i think flags in CreateFlag can not coexist with each other. Like you can not appoint Append together with Overwrite.

          Right, that's what EnumSets are for:

          EnumSet<CreateFlag> flags = EnumSet.of(CreateFlag.APPEND, CreateFlag.OVERWRITE);
          

          But, like I said: verbose.

          Show
          Jakob Homan added a comment - The reason that i have not assigned a short mode to each flag is that i think flags in CreateFlag can not coexist with each other. Like you can not appoint Append together with Overwrite. Right, that's what EnumSets are for: EnumSet<CreateFlag> flags = EnumSet.of(CreateFlag.APPEND, CreateFlag.OVERWRITE); But, like I said: verbose.
          Hide
          He Yongqiang added a comment -

          The reason that i have not assigned a short mode to each flag is that i think flags in CreateFlag can not coexist with each other. Like you can not appoint Append together with Overwrite.

          Show
          He Yongqiang added a comment - The reason that i have not assigned a short mode to each flag is that i think flags in CreateFlag can not coexist with each other. Like you can not appoint Append together with Overwrite.
          Hide
          Konstantin Shvachko added a comment -

          Sounds like a reasonable idea.

          Show
          Konstantin Shvachko added a comment - Sounds like a reasonable idea.
          Hide
          Jakob Homan added a comment -

          You can also use these to combine modes, similar to what Dhruba was working on:

              static public int combineModes(EnumSet<CreateFlag> flags) {
                int combinedMode = 0;
                for(CreateFlag f : flags) {
                  combinedMode |= f.getMode();
                }
                  
                return combinedMode;
              }
          

          for passing onto C

          Show
          Jakob Homan added a comment - You can also use these to combine modes, similar to what Dhruba was working on: static public int combineModes(EnumSet<CreateFlag> flags) { int combinedMode = 0; for (CreateFlag f : flags) { combinedMode |= f.getMode(); } return combinedMode; } for passing onto C
          Hide
          Jakob Homan added a comment -

          The Java-centric way to do C-style bit packing is with EnumSets (http://www.jakobhoman.com/2008/08/javas-enumset-fun-for-whole-family.html), which is future-proof but a bit verbose.

            enum CreateFlag {
              OVERWRITE ((short)0x01),
              APPEND    ((short)0x02);
          
              CreateFlag(short mode) { this.mode = mode; }
              CreateFlag() { this.mode = (short)0x01; }
              
              public short getMode() { return mode; }
              private short mode;
          
            }
            
            public static createFile(Path p, EnumSet<CreateFlag> flags) {
              if(flags.contains(CreateFlag.APPEND))
                // blah
           }
          

          We ran into a similar issue with create flags on Zookeeper (ZOOKEEPER-132), and ended up using them but had to enumerate all the possible values for creating actual integer flags.

          Show
          Jakob Homan added a comment - The Java-centric way to do C-style bit packing is with EnumSets ( http://www.jakobhoman.com/2008/08/javas-enumset-fun-for-whole-family.html ), which is future-proof but a bit verbose. enum CreateFlag { OVERWRITE (( short )0x01), APPEND (( short )0x02); CreateFlag( short mode) { this .mode = mode; } CreateFlag() { this .mode = ( short )0x01; } public short getMode() { return mode; } private short mode; } public static createFile(Path p, EnumSet<CreateFlag> flags) { if (flags.contains(CreateFlag.APPEND)) // blah } We ran into a similar issue with create flags on Zookeeper ( ZOOKEEPER-132 ), and ended up using them but had to enumerate all the possible values for creating actual integer flags.
          Hide
          Konstantin Shvachko added a comment -

          Dhruba, this will not compile. Constructor is required.
          I not sure you can combine bits in enum that way, but it would be nice.

          Show
          Konstantin Shvachko added a comment - Dhruba, this will not compile. Constructor is required. I not sure you can combine bits in enum that way, but it would be nice.
          Hide
          dhruba borthakur added a comment -

          +1 for Konstantin's proposal with a minor modification.

          enum ClreateFlag {
              OVERWRITE ((short)0x01),
              APPEND    ((short)0x02);
              private short mode;
          
              private setCreateFlag(short mode) {
                this.mode &= mode;
              }
              short getMode() {return mode;}
          }
          
          

          > Then C-code will be able to use them, rather than converting.

          is this really true? It actually depends on the internal JVM implementation, byte ordering, isn't it?

          Show
          dhruba borthakur added a comment - +1 for Konstantin's proposal with a minor modification. enum ClreateFlag { OVERWRITE (( short )0x01), APPEND (( short )0x02); private short mode; private setCreateFlag( short mode) { this .mode &= mode; } short getMode() { return mode;} } > Then C-code will be able to use them, rather than converting. is this really true? It actually depends on the internal JVM implementation, byte ordering, isn't it?
          Hide
          Konstantin Shvachko added a comment -

          You can use exact values, like

          enum ClreateFlag {
              OVERWRITE ((short)0x01),
              APPEND    ((short)0x02);
              private short mode;
          
              private ClreateFlag(short mode) {
                this.mode = mode;
              }
              short getMode() {return mode;}
          }
          

          Then C-code will be able to use them, rather than converting.

          Show
          Konstantin Shvachko added a comment - You can use exact values, like enum ClreateFlag { OVERWRITE (( short )0x01), APPEND (( short )0x02); private short mode; private ClreateFlag( short mode) { this .mode = mode; } short getMode() { return mode;} } Then C-code will be able to use them, rather than converting.
          Hide
          He Yongqiang added a comment -

          Is an eumn like this ok?

          static enum CreateFlag {
          	CREATE, OVERWRITE, APPEND, TUNCATE;
           }
          

          And the change Filesytem.create to:

                public FSDataOutputStream create(Path f, FsPermission permission,
          			boolean overwrite, int bufferSize, short replication,
          			long blockSize, Progressable progress) throws IOException {
          
          		return this.create(f, permission, overwrite ? CreateFlag.OVERWRITE
          				: CreateFlag.CREATE, bufferSize, replication, blockSize,
          				progress);
          	}
          
          	public abstract FSDataOutputStream create(Path f, FsPermission permission,
          			CreateFlag flag, int bufferSize, short replication, long blockSize,
          			Progressable progress) throws IOException ;
          
          Show
          He Yongqiang added a comment - Is an eumn like this ok? static enum CreateFlag { CREATE, OVERWRITE, APPEND, TUNCATE; } And the change Filesytem.create to: public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException { return this .create(f, permission, overwrite ? CreateFlag.OVERWRITE : CreateFlag.CREATE, bufferSize, replication, blockSize, progress); } public abstract FSDataOutputStream create(Path f, FsPermission permission, CreateFlag flag, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException ;
          Hide
          Konstantin Shvachko added a comment -

          +1 for Option 2. Even though it requires more changes.
          Can you also make Flag (or may be CreateFlag) a real enum rather than a class.

          Show
          Konstantin Shvachko added a comment - +1 for Option 2. Even though it requires more changes. Can you also make Flag (or may be CreateFlag) a real enum rather than a class.
          Hide
          dhruba borthakur added a comment -

          > And I think TUNCATE is of the same meaning of overwrite, i

          TRUNCATE should keep the original file metadata (user permissions, block size, etc)...

          Show
          dhruba borthakur added a comment - > And I think TUNCATE is of the same meaning of overwrite, i TRUNCATE should keep the original file metadata (user permissions, block size, etc)...
          Hide
          He Yongqiang added a comment -

          thanks, dhruba.
          yeah, Option 2 is generally more like the POSIX way. But it seems HDFS only provides two way for writing, append or overwrite. And I think TUNCATE is of the same meaning of overwrite, is that right?

          Show
          He Yongqiang added a comment - thanks, dhruba. yeah, Option 2 is generally more like the POSIX way. But it seems HDFS only provides two way for writing, append or overwrite. And I think TUNCATE is of the same meaning of overwrite, is that right?
          Hide
          dhruba borthakur added a comment -

          I am fine either way, but I strongly like Option 2 because it manages the FileSystem API better and allows setting up other flags in future. For example, we might want to support a TRUNCATE flag in the future.

          Show
          dhruba borthakur added a comment - I am fine either way, but I strongly like Option 2 because it manages the FileSystem API better and allows setting up other flags in future. For example, we might want to support a TRUNCATE flag in the future.
          Hide
          He Yongqiang added a comment -

          I prefer the way adding a new param to append(). It will need little modification to user app, and the modifications to hdfs code would be only a few lines.

          Show
          He Yongqiang added a comment - I prefer the way adding a new param to append(). It will need little modification to user app, and the modifications to hdfs code would be only a few lines.
          Hide
          dhruba borthakur added a comment -

          I am ok with adding yet another parameter to FileSystem.create(). This will be a parameter of type FileSystem.OpenFlags or something like that.

          Show
          dhruba borthakur added a comment - I am ok with adding yet another parameter to FileSystem.create(). This will be a parameter of type FileSystem.OpenFlags or something like that.
          Hide
          He Yongqiang added a comment -

          yeah, it would be more POSIX-like if we add this into FileSystem.open(). But Hadoop's FileSystem.open() opens an InputStream for reading, and FileSytem.create() and FileSystem.append() opens an OutputStream for writting. I think it would be more easy to add this to create() or appen().
          I think we have two options here:
          1) add a boolean flag to append() to indicate that should we create the file if it does not existes
          2) add a Enum Type (let's call it Flag or sth. like that). And pass this mode param to create(). like:

           public abstract FSDataOutputStream create(Path f,
                FsPermission permission,
            //    boolean overwrite,
                short mode, 
                int bufferSize,
                short replication,
                long blockSize,
                Progressable progress) throws IOException;
          
          static class Flag {
          		private static short OVERWRITE = (short) 0x01;
          		private static short APPEND = (short) 0x02;
          		private short mode;
          
          		public Flag(short mode) {
          			this.mode = mode;
          		}
          		
          		public boolean isOverwrite(){
          			return (this.mode&OVERWRITE)>0;
          		}
          		
          		public boolean isAppend(){
          			return (this.mode&APPEND)>0;
          		}
          	}
          

          1) and 2) have the same effect. And 2) is more POSIX-like, but almost every supported file systems have implemented the create(), so it need more modication to the code.

          with 1), we only need to add few lines in FSNameSystem's startFileInternal like this:

           if (append) {
                  if (myFile == null) {
          +           if(create){
          +              startFileInternal(src, permissions,holder, clientMachine,false, false, replication,blockSize)  // create the file.
          +           } else{
                         throw new FileNotFoundException("failed to append to non-existent file "
                                                 + src + " on client " + clientMachine);
                       }
                  } else if (myFile.isDirectory()) {
                    throw new IOException("failed to append to directory " + src 
                                          +" on client " + clientMachine);
                  }
                } else if (!dir.isValidToCreate(src)) {
                  if (overwrite) {
                    delete(src, true);
                  } else {
                    throw new IOException("failed to create file " + src 
                                          +" on client " + clientMachine
                                          +" either because the filename is invalid or the file exists");
                  }
                }
          

          pls comment on which option should we choose or other options.

          Show
          He Yongqiang added a comment - yeah, it would be more POSIX-like if we add this into FileSystem.open(). But Hadoop's FileSystem.open() opens an InputStream for reading, and FileSytem.create() and FileSystem.append() opens an OutputStream for writting. I think it would be more easy to add this to create() or appen(). I think we have two options here: 1) add a boolean flag to append() to indicate that should we create the file if it does not existes 2) add a Enum Type (let's call it Flag or sth. like that). And pass this mode param to create(). like: public abstract FSDataOutputStream create(Path f, FsPermission permission, // boolean overwrite, short mode, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException; static class Flag { private static short OVERWRITE = ( short ) 0x01; private static short APPEND = ( short ) 0x02; private short mode; public Flag( short mode) { this .mode = mode; } public boolean isOverwrite(){ return ( this .mode&OVERWRITE)>0; } public boolean isAppend(){ return ( this .mode&APPEND)>0; } } 1) and 2) have the same effect. And 2) is more POSIX-like, but almost every supported file systems have implemented the create(), so it need more modication to the code. with 1), we only need to add few lines in FSNameSystem's startFileInternal like this: if (append) { if (myFile == null ) { + if (create){ + startFileInternal(src, permissions,holder, clientMachine, false , false , replication,blockSize) // create the file. + } else { throw new FileNotFoundException( "failed to append to non-existent file " + src + " on client " + clientMachine); } } else if (myFile.isDirectory()) { throw new IOException( "failed to append to directory " + src + " on client " + clientMachine); } } else if (!dir.isValidToCreate(src)) { if (overwrite) { delete(src, true ); } else { throw new IOException( "failed to create file " + src + " on client " + clientMachine + " either because the filename is invalid or the file exists" ); } } pls comment on which option should we choose or other options.
          Hide
          dhruba borthakur added a comment -

          I like the idea, but it isn't it better if, instead of adding a boolean flag, we introduce a bit-map (i.e. EnumType) so that various flags can be passed into the "append" method? Also, maybe it is appropriate to call it FileSystem.open() to make it more POSIX-like.

          Show
          dhruba borthakur added a comment - I like the idea, but it isn't it better if, instead of adding a boolean flag, we introduce a bit-map (i.e. EnumType) so that various flags can be passed into the "append" method? Also, maybe it is appropriate to call it FileSystem.open() to make it more POSIX-like.
          Hide
          He Yongqiang added a comment -

          How about we add a new boolean param for appen()?

            public abstract FSDataOutputStream append(Path f, int bufferSize,boolean create,Progressable progress) throws IOException;
          

          Since this is a abstract method on FileSystem, it would need all implemented filesystems to modify to support this. I have checked all implemented fs, and it seems currently only DistributedFileSystem and RawLocalFileSystem support append, and others just throw a Not Supported Exception.

          Any comments?

          Show
          He Yongqiang added a comment - How about we add a new boolean param for appen()? public abstract FSDataOutputStream append(Path f, int bufferSize, boolean create,Progressable progress) throws IOException; Since this is a abstract method on FileSystem, it would need all implemented filesystems to modify to support this. I have checked all implemented fs, and it seems currently only DistributedFileSystem and RawLocalFileSystem support append, and others just throw a Not Supported Exception. Any comments?
          Hide
          Konstantin Shvachko added a comment -

          FileSystem.append() semantically simply opens a file for append, and open() usually creates a file if it does not exist (with appropriate flags). So it seems to me reasonable to merge these methods to make them more POSIX friendly.

          Show
          Konstantin Shvachko added a comment - FileSystem.append() semantically simply opens a file for append, and open() usually creates a file if it does not exist (with appropriate flags). So it seems to me reasonable to merge these methods to make them more POSIX friendly.
          Hide
          He Yongqiang added a comment -

          I guess the open and create interfaces can leave there for compatibility, and can we add a new interface for open? With the new open interface, all work can be done with only one RPC. And the atomicity can be guaranteed at the server side. That's great, since the atomicity can not be made at the client side.

          The atomicity is a great point, thanks hong.

          Show
          He Yongqiang added a comment - I guess the open and create interfaces can leave there for compatibility, and can we add a new interface for open? With the new open interface, all work can be done with only one RPC. And the atomicity can be guaranteed at the server side. That's great, since the atomicity can not be made at the client side. The atomicity is a great point, thanks hong.
          Hide
          Hong Tang added a comment -

          fopen with mode "a+" or open with flag=O_CREAT|O_APPEND actually has the extra atomicity guarantee while the check-existence-then-create-or-append does not.

          Show
          Hong Tang added a comment - fopen with mode "a+" or open with flag=O_CREAT|O_APPEND actually has the extra atomicity guarantee while the check-existence-then-create-or-append does not.
          Hide
          Zheng Shao added a comment -

          You can create a wrapper for a method like createOrAppend to provide fopen "a+" semantics.

          However we should not change the semantics of existing functions.

          Show
          Zheng Shao added a comment - You can create a wrapper for a method like createOrAppend to provide fopen "a+" semantics. However we should not change the semantics of existing functions.

            People

            • Assignee:
              He Yongqiang
              Reporter:
              He Yongqiang
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development