Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: datanode, namenode
    • Labels:
      None

      Description

      Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS.

        Issue Links

          Activity

          Hide
          Lei Chang added a comment -

          > The proposed way to go about #3 by creating copies at the DN level and truncating there seems messy, but if you think about it as a variant of #2 that leaks less information into the API (block boundaries, contents of last segment), it seems simpler to me.

          Agree with you, if only looking at the simplicity of the internal RPC APIs, #3 is simpler. However, from the implementation part, in #3, clients need to work with both NN and DNs. There are many cases clients should take care when some nodes fails in the copy/truncate phase and some nodes succeed. For example:
          1) The client should work with NN to handle the failures and do some recovery when DN fails. It is somewhat like the pipeline rebuild and recovery in the APPEND case.
          2) Client fail introduces some extra work too. (#1 also has to deal with this case, but simpler)

          Thus, the implementation of #3 should be easier.

          You mentioned a good point about the security part of the temporary file. It should be created with the same access privilege to the file being truncated.

          Show
          Lei Chang added a comment - > The proposed way to go about #3 by creating copies at the DN level and truncating there seems messy, but if you think about it as a variant of #2 that leaks less information into the API (block boundaries, contents of last segment), it seems simpler to me. Agree with you, if only looking at the simplicity of the internal RPC APIs, #3 is simpler. However, from the implementation part, in #3, clients need to work with both NN and DNs. There are many cases clients should take care when some nodes fails in the copy/truncate phase and some nodes succeed. For example: 1) The client should work with NN to handle the failures and do some recovery when DN fails. It is somewhat like the pipeline rebuild and recovery in the APPEND case. 2) Client fail introduces some extra work too. (#1 also has to deal with this case, but simpler) Thus, the implementation of #3 should be easier. You mentioned a good point about the security part of the temporary file. It should be created with the same access privilege to the file being truncated.
          Hide
          Scott Carey added a comment -

          I think #2 and #3 can be the same thing, except that the system creates "concatFile" by reading some number of bytes from the last block, and that it does not throw an error if the passed in length is not on a block boundary in #3.

          The proposed way to go about #3 by creating copies at the DN level and truncating there seems messy, but if you think about it as a variant of #2 that leaks less information into the API (block boundaries, contents of last segment), it seems simpler to me.

          Show
          Scott Carey added a comment - I think #2 and #3 can be the same thing, except that the system creates "concatFile" by reading some number of bytes from the last block, and that it does not throw an error if the passed in length is not on a block boundary in #3. The proposed way to go about #3 by creating copies at the DN level and truncating there seems messy, but if you think about it as a variant of #2 that leaks less information into the API (block boundaries, contents of last segment), it seems simpler to me.
          Hide
          Lei Chang added a comment -

          Indicating EOF is reasonable, and this is exactly the behavior in the preliminary prototype built before.

          Show
          Lei Chang added a comment - Indicating EOF is reasonable, and this is exactly the behavior in the preliminary prototype built before.
          Hide
          M. C. Srivas added a comment -

          @Zhanwei and @TszWo: A comment on truncate & read interaction: The behavior of the read() system call in Posix is to return fewer number of bytes than asked for if EOF is encountered early. For example, if a file is of length 100 bytes, and a thread comes along and tries to read 200 bytes starting at offset 20, then read() should return 80. Subsequent calls to read() then return 0, to indicate EOF. The same principle can be applied to a file that gets truncated after it is opened for read ... treat it like a file that got shortened, ie, do a short read the first time, and raise the EOF exception subsequently.

          Show
          M. C. Srivas added a comment - @Zhanwei and @TszWo: A comment on truncate & read interaction: The behavior of the read() system call in Posix is to return fewer number of bytes than asked for if EOF is encountered early. For example, if a file is of length 100 bytes, and a thread comes along and tries to read 200 bytes starting at offset 20, then read() should return 80. Subsequent calls to read() then return 0, to indicate EOF. The same principle can be applied to a file that gets truncated after it is opened for read ... treat it like a file that got shortened, ie, do a short read the first time, and raise the EOF exception subsequently.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... I also think it is better to not change the API ...

          Agree. Milind's idea on changing only the rpc and keeping the API unchanged is great.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... I also think it is better to not change the API ... Agree. Milind's idea on changing only the rpc and keeping the API unchanged is great.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > A problem of truncate is the "visibility". ...

          @Zhanwei, The behavior of truncate should be similar to delete. For delete, readers can continue reading the file after the file is deleted. The reader will fail if it further talks to the Namenode. For truncate, it seems okay to let the reader continue reading even beyond the truncated length and it will fail when it talks to the Namenode. What do you think?

          > One minor comment: please think about maintenance problems when you expose funky semantics ...

          @M.C., I got your point. We probably should annotate the new API as unstable or evolving in the first release.

          Show
          Tsz Wo Nicholas Sze added a comment - > A problem of truncate is the "visibility". ... @Zhanwei, The behavior of truncate should be similar to delete. For delete, readers can continue reading the file after the file is deleted. The reader will fail if it further talks to the Namenode. For truncate, it seems okay to let the reader continue reading even beyond the truncated length and it will fail when it talks to the Namenode. What do you think? > One minor comment: please think about maintenance problems when you expose funky semantics ... @M.C., I got your point. We probably should annotate the new API as unstable or evolving in the first release.
          Hide
          Lei Chang added a comment -

          > Lei, do you see any issues with the proposal (i.e option 2) ?

          I do like the second option , for its simplicity to implement atomicity. I also think it is better to not change the API which should be much easier to use for end users. truncate(file, length, concatFile) can be used internally.

          For the third option (1), it indeed introduces a lot of difficulties in implementation, such as fault tolerance implementation issues. This has been seen in our first try to implement truncate which takes a more strong semantic.

          IMO, for visibility, the weak consistency for concurrent read is ok for upper layer applications. For instance, most database systems already use their own locks to synchronize the concurrent access to files or file blocks.

          Show
          Lei Chang added a comment - > Lei, do you see any issues with the proposal (i.e option 2) ? I do like the second option , for its simplicity to implement atomicity. I also think it is better to not change the API which should be much easier to use for end users. truncate(file, length, concatFile) can be used internally. For the third option (1), it indeed introduces a lot of difficulties in implementation, such as fault tolerance implementation issues. This has been seen in our first try to implement truncate which takes a more strong semantic. IMO, for visibility, the weak consistency for concurrent read is ok for upper layer applications. For instance, most database systems already use their own locks to synchronize the concurrent access to files or file blocks.
          Hide
          Zhanwei.Wang added a comment -

          Add more detail to my previous question, how to define "may read content of a file that will be truncated", that is the "visibility" problem. If a file is opened and read just before truncation, should the truncated data be visible? Or just depends on the process of truncation? What if a file is opened before truncation and read after truncation?

          Show
          Zhanwei.Wang added a comment - Add more detail to my previous question, how to define "may read content of a file that will be truncated", that is the "visibility" problem. If a file is opened and read just before truncation, should the truncated data be visible? Or just depends on the process of truncation? What if a file is opened before truncation and read after truncation?
          Hide
          M. C. Srivas added a comment -

          One minor comment: please think about maintenance problems when you expose funky semantics that have been tacked on to truncate() ... people will start using it, and it will be hard/impossible to change. It is easy to add code, but very difficult to remove it later.

          I see that you need something like what's being proposed to implement snapshots, but it should be an internal-only API and not exposed.

          Show
          M. C. Srivas added a comment - One minor comment: please think about maintenance problems when you expose funky semantics that have been tacked on to truncate() ... people will start using it, and it will be hard/impossible to change. It is easy to add code, but very difficult to remove it later. I see that you need something like what's being proposed to implement snapshots, but it should be an internal-only API and not exposed.
          Hide
          Zhanwei.Wang added a comment -

          A problem of truncate is the "visibility". Since to truncate a file needs to get the lease first, we do not need to take care of concurrent write, but we need to take care of concurrent read when we truncate a file. Hdfs client will buffer some block info when open and read a file, while these blocks may be truncated. Furthermore, socket and Hdfs client may buffer some data which may will be truncated.

          When I implement the first edition of truncate prototype, if the block or data the client required is truncated, datanode will throw a exception and client will update the metadata to check if the data is truncated or the real error happened. But this cannot prevent the client reading buffered data.

          Any comment and suggestion?

          Show
          Zhanwei.Wang added a comment - A problem of truncate is the "visibility". Since to truncate a file needs to get the lease first, we do not need to take care of concurrent write, but we need to take care of concurrent read when we truncate a file. Hdfs client will buffer some block info when open and read a file, while these blocks may be truncated. Furthermore, socket and Hdfs client may buffer some data which may will be truncated. When I implement the first edition of truncate prototype, if the block or data the client required is truncated, datanode will throw a exception and client will update the metadata to check if the data is truncated or the real error happened. But this cannot prevent the client reading buffered data. Any comment and suggestion?
          Hide
          Milind Bhandarkar added a comment -

          We don't need unowned block. The block should be owned by the user since the data of the block is.

          Sorry, by "unowned", I meant block that does not belong to any file. If a separate block management becomes an external first class citizen, one could have existing blocks added to a new namespace (from what I remember in the vision discussions with Sanjay last year), and concat could take a list of blocks instead of a file.

          Lei, do you see any issues with the proposal (i.e option 2) ?

          Show
          Milind Bhandarkar added a comment - We don't need unowned block. The block should be owned by the user since the data of the block is. Sorry, by "unowned", I meant block that does not belong to any file. If a separate block management becomes an external first class citizen, one could have existing blocks added to a new namespace (from what I remember in the vision discussions with Sanjay last year), and concat could take a list of blocks instead of a file. Lei, do you see any issues with the proposal (i.e option 2) ?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... The namenode RPC should have length as exactly a multiple of blocksize ...

          Agree.

          > Is this futureproof ? What happens where block storage becomes first class citizen ? Will we have an ability to create unowned block, and append it to an existing (truncated to block boundary) file ?

          I think you mean separating namespace management and block management. The beauty of truncate with concat is that it does not introduce new block level operations so that there is no block management change (arguably, it makes sense adding a new operation for copying an existing block to a new block so that the copy can be done locally in datanodes but this not a big deal.) It shouldn't cause any problem.

          We don't need unowned block. The block should be owned by the user since the data of the block is.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... The namenode RPC should have length as exactly a multiple of blocksize ... Agree. > Is this futureproof ? What happens where block storage becomes first class citizen ? Will we have an ability to create unowned block, and append it to an existing (truncated to block boundary) file ? I think you mean separating namespace management and block management. The beauty of truncate with concat is that it does not introduce new block level operations so that there is no block management change (arguably, it makes sense adding a new operation for copying an existing block to a new block so that the copy can be done locally in datanodes but this not a big deal.) It shouldn't cause any problem. We don't need unowned block. The block should be owned by the user since the data of the block is.
          Hide
          Milind Bhandarkar added a comment -

          Thought a little more about this. The namenode RPC should have length as exactly a multiple of blocksize, otherwise it should throw exception. (so, option 1 in my comment above.) And concatFile could be null, so that case where files need to be truncated to exactly block boundary (option 1 in Nicholas's comment) can be trivially supported.

          Show
          Milind Bhandarkar added a comment - Thought a little more about this. The namenode RPC should have length as exactly a multiple of blocksize, otherwise it should throw exception. (so, option 1 in my comment above.) And concatFile could be null, so that case where files need to be truncated to exactly block boundary (option 1 in Nicholas's comment) can be trivially supported.
          Hide
          Milind Bhandarkar added a comment -

          I really like Sanjay's idea (i.e. 2) for it's simplicity, atomicity, and ease of implementation. However, I would like the client API to be truncate(file, length), and namenode RPC to be truncate(file, length, concatFile).

          In the client API, i.e. method in DFSClient, length could be anything less than the file size, whereas in the namenode RPC, we have two options:

          1. length must be a multiple of blocksize for file, less than filesize

          2. length could be anything less than filesize, but namenode will do length = (length - (length%blocksize)).

          For the client API to support any length, without specifying concatFile, we need to add o.a.h.fs.FileUtils.createTempFile.

          Thoughts ?

          Is this futureproof ? What happens where block storage becomes first class citizen ? Will we have an ability to create unowned block, and append it to an existing (truncated to block boundary) file ?

          Show
          Milind Bhandarkar added a comment - I really like Sanjay's idea (i.e. 2) for it's simplicity, atomicity, and ease of implementation. However, I would like the client API to be truncate(file, length), and namenode RPC to be truncate(file, length, concatFile). In the client API, i.e. method in DFSClient, length could be anything less than the file size, whereas in the namenode RPC, we have two options: 1. length must be a multiple of blocksize for file, less than filesize 2. length could be anything less than filesize, but namenode will do length = (length - (length%blocksize)). For the client API to support any length, without specifying concatFile, we need to add o.a.h.fs.FileUtils.createTempFile. Thoughts ? Is this futureproof ? What happens where block storage becomes first class citizen ? Will we have an ability to create unowned block, and append it to an existing (truncated to block boundary) file ?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The full truncate feature may be hard to implement. Below are some ideas.

          (1) Support only block boundary truncate.

          When the length is not a multiple of the block size, throw an exception.

          This is very easy to implement. The client could use it to support full truncate by (i ) copy the data in the truncated block, (ii) truncate to block boundary and (iii) append the data back.

          Example 1: Suppose file size is 290 and block size is 100. Then, truncate(file, 180) can be done by (i ) reading bytes from position 101 to 180, (ii) truncate(file, 100) and (iii) append the 80 bytes back.

          (2) truncate with concat. (Sanjay’s idea)

          A problem of (1) is that it is not atomic. It may end up with finishing (ii) but failing at (iii). A remedy is to add a parameter and the API becomes truncate(file, length, concateFile), where length must be a multiple of the block size. HDFS will first truncate file to length and then concatenate the block in concateFile to the end of file. Note that this is a namenode-only operation and can be easily implemented as atomic.

          Example 2: For the case in e.g. 1, truncate can be done by first copying bytes 101 to 180 to a new file f and then truncate(file, 100, f).

          (3) Copy on truncate.

          Support full truncate by first copying the last block to a new block, then truncate the replicas of the new block at the datanodes and then commit the truncate once the datanodes report to new block to the namenode. Rollback is possible since the old block is still around. This is harder to implement than (1) or (2).

          Show
          Tsz Wo Nicholas Sze added a comment - The full truncate feature may be hard to implement. Below are some ideas. (1) Support only block boundary truncate. When the length is not a multiple of the block size, throw an exception. This is very easy to implement. The client could use it to support full truncate by (i ) copy the data in the truncated block, (ii) truncate to block boundary and (iii) append the data back. Example 1: Suppose file size is 290 and block size is 100. Then, truncate(file, 180) can be done by (i ) reading bytes from position 101 to 180, (ii) truncate(file, 100) and (iii) append the 80 bytes back. (2) truncate with concat. (Sanjay’s idea) A problem of (1) is that it is not atomic. It may end up with finishing (ii) but failing at (iii). A remedy is to add a parameter and the API becomes truncate(file, length, concateFile), where length must be a multiple of the block size. HDFS will first truncate file to length and then concatenate the block in concateFile to the end of file. Note that this is a namenode-only operation and can be easily implemented as atomic. Example 2: For the case in e.g. 1, truncate can be done by first copying bytes 101 to 180 to a new file f and then truncate(file, 100, f). (3) Copy on truncate. Support full truncate by first copying the last block to a new block, then truncate the replicas of the new block at the datanodes and then commit the truncate once the datanodes report to new block to the namenode. Rollback is possible since the old block is still around. This is harder to implement than (1) or (2).
          Hide
          Milind Bhandarkar added a comment -

          Thanks Lei for addressing Nicholas's comments

          Show
          Milind Bhandarkar added a comment - Thanks Lei for addressing Nicholas's comments
          Hide
          Lei Chang added a comment -

          Good comments . Nicolas.

          Attached file revised. Looks more clean now.

          Show
          Lei Chang added a comment - Good comments . Nicolas. Attached file revised. Looks more clean now.
          Hide
          Milind Bhandarkar added a comment -

          Thanks Nicolas.

          yes, we are implementing it. If there is agreement (pending discussion on hdfs-dev list) that it is a useful feature, we will upload the design, and patch will follow.

          Show
          Milind Bhandarkar added a comment - Thanks Nicolas. yes, we are implementing it. If there is agreement (pending discussion on hdfs-dev list) that it is a useful feature, we will upload the design, and patch will follow.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Suresh, Nicolas, Eli; Any opinions about the proposed API and semantics ?

          The API looks reasonable to me. How about the implementation? Do you have plan to implement it?

          A minor comment: the usual practice for referring other documentation is to include links or a list of references but not copy & paste.

          Show
          Tsz Wo Nicholas Sze added a comment - > Suresh, Nicolas, Eli; Any opinions about the proposed API and semantics ? The API looks reasonable to me. How about the implementation? Do you have plan to implement it? A minor comment: the usual practice for referring other documentation is to include links or a list of references but not copy & paste.
          Hide
          Eli Collins added a comment -

          Is the proposal to remove appends from all 1.x+ versions of Hadoop or just the 1.x versions ?

          All versions. See the mail I just sent to hdfs-dev@, let's continue the discussion there.

          Show
          Eli Collins added a comment - Is the proposal to remove appends from all 1.x+ versions of Hadoop or just the 1.x versions ? All versions. See the mail I just sent to hdfs-dev@, let's continue the discussion there.
          Hide
          Milind Bhandarkar added a comment -

          I understand that truncate adds more complexity, and have discussed the design at length offline with Sanjay and Hairong. We plan to reuse the append-pipeline for this, and therefore restricted the API to only work with closed files. (We have submitted the exact use case presentation proposal to Hadoop summit, without exposing it to public voting currently, but hopefully would be able to publicly announce it in a few weeks.) The transaction feature is not HDFS-specific, but is at application-level, and works with other file systems that support truncate.

          I don't follow... we don't even expose append() via the shell.

          Indeed, I was not talking about Apache Hadoop, but a distribution that includes this feature.

          Otherwise I'm more inclined to agree with Eli's suggestion to remove append entirely (please continue that discussion on-list, though).

          Is the proposal to remove appends from all 1.x+ versions of Hadoop or just the 1.x versions ?

          Show
          Milind Bhandarkar added a comment - I understand that truncate adds more complexity, and have discussed the design at length offline with Sanjay and Hairong. We plan to reuse the append-pipeline for this, and therefore restricted the API to only work with closed files. (We have submitted the exact use case presentation proposal to Hadoop summit, without exposing it to public voting currently, but hopefully would be able to publicly announce it in a few weeks.) The transaction feature is not HDFS-specific, but is at application-level, and works with other file systems that support truncate. I don't follow... we don't even expose append() via the shell. Indeed, I was not talking about Apache Hadoop, but a distribution that includes this feature. Otherwise I'm more inclined to agree with Eli's suggestion to remove append entirely (please continue that discussion on-list, though). Is the proposal to remove appends from all 1.x+ versions of Hadoop or just the 1.x versions ?
          Hide
          Todd Lipcon added a comment -

          IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much because truncating a block is that hard – but rather because it breaks a serious invariant we have elsewhere that blocks only get longer after they are created. This means that we have to revisit code all over HDFS – in particular some of the trickiest bits around block synchronization – to get this to work. It's not insurmountable, but I would like to know a lot more about the use case before commenting on the API/semantics.

          Maybe you can open a JIRA or upload a design about your transactional HDFS feature, so we can understand the motivation better? Otherwise I'm more inclined to agree with Eli's suggestion to remove append entirely (please continue that discussion on-list, though).

          After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends.

          This is where customers realize their mistake immediately after starting to append, and do a ctrl-c.

          I don't follow... we don't even expose append() via the shell. And if we did, would users actually be using "fs -append" to manually write new lines of data into their Hadoop systems??

          Show
          Todd Lipcon added a comment - IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much because truncating a block is that hard – but rather because it breaks a serious invariant we have elsewhere that blocks only get longer after they are created. This means that we have to revisit code all over HDFS – in particular some of the trickiest bits around block synchronization – to get this to work. It's not insurmountable, but I would like to know a lot more about the use case before commenting on the API/semantics. Maybe you can open a JIRA or upload a design about your transactional HDFS feature, so we can understand the motivation better? Otherwise I'm more inclined to agree with Eli's suggestion to remove append entirely (please continue that discussion on-list, though). After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends. This is where customers realize their mistake immediately after starting to append, and do a ctrl-c. I don't follow... we don't even expose append() via the shell. And if we did, would users actually be using "fs -append" to manually write new lines of data into their Hadoop systems??
          Hide
          Milind Bhandarkar added a comment -

          Suresh, Nicolas, Eli; Any opinions about the proposed API and semantics ?

          Show
          Milind Bhandarkar added a comment - Suresh, Nicolas, Eli; Any opinions about the proposed API and semantics ?
          Hide
          Milind Bhandarkar added a comment -

          Yes, I am using the term "append" loosely, because of FB's 20-append branch. Our transaction work is done with 0.23.x.

          Show
          Milind Bhandarkar added a comment - Yes, I am using the term "append" loosely, because of FB's 20-append branch. Our transaction work is done with 0.23.x.
          Hide
          Eli Collins added a comment -

          Since appends were enabled very recently, only those with the facebook's version of hadoop, or hadoop 1.0 are users doing this now.

          Append doesn't work on hadoop 1.0, see HDFS-3120. I'm actually going to start a discussion about removing append entirely on hdfs-dev@.

          Show
          Eli Collins added a comment - Since appends were enabled very recently, only those with the facebook's version of hadoop, or hadoop 1.0 are users doing this now. Append doesn't work on hadoop 1.0, see HDFS-3120 . I'm actually going to start a discussion about removing append entirely on hdfs-dev@.
          Hide
          Milind Bhandarkar added a comment -

          .bq Thats okay. You missed the smiley in the tweet too.

          I just copy-pasted, so it was expected

          .bq I see I was not aware it was that common.

          Since appends were enabled very recently, only those with the facebook's version of hadoop, or hadoop 1.0 are users doing this now. Before this, users were creating multiple files.

          In any case, my interest in this feature is for implementing transactions over HDFS (as Lei and I have already discussed with Sanjay Radia and Hairong.) And aborting a transaction means truncating to the last known good data across multiple files.

          Show
          Milind Bhandarkar added a comment - .bq Thats okay. You missed the smiley in the tweet too. I just copy-pasted, so it was expected .bq I see I was not aware it was that common. Since appends were enabled very recently, only those with the facebook's version of hadoop, or hadoop 1.0 are users doing this now. Before this, users were creating multiple files. In any case, my interest in this feature is for implementing transactions over HDFS (as Lei and I have already discussed with Sanjay Radia and Hairong.) And aborting a transaction means truncating to the last known good data across multiple files.
          Hide
          Suresh Srinivas added a comment -

          I must have missed a smiley

          Thats okay. You missed the smiley in the tweet too.

          This is very common.

          I see I was not aware it was that common.

          Show
          Suresh Srinivas added a comment - I must have missed a smiley Thats okay. You missed the smiley in the tweet too. This is very common. I see I was not aware it was that common.
          Hide
          Milind Bhandarkar added a comment -

          I must have missed a smiley

          Nicolas,

          After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends.

          This is where customers realize their mistake immediately after starting to append, and do a ctrl-c.

          This is very common.


          Milind Bhandarkar
          Chief Architect, Greenplum Labs,
          Data Computing Division, EMC
          +1-650-523-3858 (W)
          +1-408-666-8483 (C)

          Show
          Milind Bhandarkar added a comment - I must have missed a smiley Nicolas, After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends. This is where customers realize their mistake immediately after starting to append, and do a ctrl-c. This is very common. – Milind Bhandarkar Chief Architect, Greenplum Labs, Data Computing Division, EMC +1-650-523-3858 (W) +1-408-666-8483 (C)
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Easy, Milind. I do agree with Suresh that (2) is not a very good reason to have truncate. I think such accidence is rare. However, you made a good point that having append without truncate is a deficiency.

          Show
          Tsz Wo Nicholas Sze added a comment - Easy, Milind. I do agree with Suresh that (2) is not a very good reason to have truncate. I think such accidence is rare. However, you made a good point that having append without truncate is a deficiency.
          Hide
          Milind Bhandarkar added a comment -

          What if user accidentally deletes a directory ? You guys never supported me when I asked for a file-by-file deletion, that could be aborted in time to save 70 pct of users' time, right? Instead you have always supported a directory deletion with a single misdirected RPC.

          Anyway, to answer your question, if user accidentally truncates, he/she can always append again, without losing any efficiency.

          Can we have some mature discussions on this jira please ?


          Milind Bhandarkar
          Chief Architect, Greenplum Labs,
          Data Computing Division, EMC
          +1-650-523-3858 (W)
          +1-408-666-8483 (C)

          Show
          Milind Bhandarkar added a comment - What if user accidentally deletes a directory ? You guys never supported me when I asked for a file-by-file deletion, that could be aborted in time to save 70 pct of users' time, right? Instead you have always supported a directory deletion with a single misdirected RPC. Anyway, to answer your question, if user accidentally truncates, he/she can always append again, without losing any efficiency. Can we have some mature discussions on this jira please ? – Milind Bhandarkar Chief Architect, Greenplum Labs, Data Computing Division, EMC +1-650-523-3858 (W) +1-408-666-8483 (C)
          Hide
          Suresh Srinivas added a comment -

          if a user mistakenly starts to append data to an existing large file, and discovers the mistake, the only recourse is to recreate that file, by rewriting the contents. This is very inefficient.

          What if user accidentally truncates a file

          Show
          Suresh Srinivas added a comment - if a user mistakenly starts to append data to an existing large file, and discovers the mistake, the only recourse is to recreate that file, by rewriting the contents. This is very inefficient. What if user accidentally truncates a file
          Hide
          Milind Bhandarkar added a comment -

          This will be a great addition to HDFS for a couple of reasons:

          1. Having an append without a truncate is a serious deficiency.
          2. If a user mistakenly starts to append data to an existing large file, and discovers the mistake, the only recourse is to recreate that file, by rewriting the contents. This is very inefficient.

          Show
          Milind Bhandarkar added a comment - This will be a great addition to HDFS for a couple of reasons: 1. Having an append without a truncate is a serious deficiency. 2. If a user mistakenly starts to append data to an existing large file, and discovers the mistake, the only recourse is to recreate that file, by rewriting the contents. This is very inefficient.

            People

            • Assignee:
              Unassigned
              Reporter:
              Lei Chang
            • Votes:
              4 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 1,344h
                1,344h
                Remaining:
                Remaining Estimate - 1,344h
                1,344h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development