Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1111

getCorruptFiles() should give some hint that the list is not complete

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If the list of corruptfiles returned by the namenode doesn't say anything if the number of corrupted files is larger than the call output limit (which means the list is not complete). There should be a way to hint incompleteness to clients.

      A simple hack would be to add an extra entry to the array returned with the value null. Clients could interpret this as a sign that there are other corrupt files in the system.

      We should also do some rephrasing of the fsck output to make it more confident when the list is not complete and less confident when the list is known to be incomplete.

      1. HDFS-1111-y20.2.patch
        23 kB
        Sriram Rao
      2. HDFS-1111-y20.1.patch
        16 kB
        Sriram Rao
      3. HDFS-1111.trunk.patch
        39 kB
        Sriram Rao
      4. HDFS-1111.trunk.2s.patch
        56 kB
        Konstantin Shvachko
      5. HDFS-1111.trunk.2.patch
        55 kB
        Sriram Rao
      6. HDFS-1111.trunk.1.patch
        55 kB
        Sriram Rao
      7. HADFS-1111.0.patch
        23 kB
        Rodrigo Schmidt

        Issue Links

          Activity

          Hide
          Rodrigo Schmidt added a comment -

          Dhruba, what do you think of returning an extra null entry when the list is known to be incomplete?

          It looks a little hacky, but does the job quite well with minor code changes.

          We can't just test the size of the output provided at the client side because "dfs.corruptfilesreturned.max" is a server-side configuration and clients might have the wrong value. The hint must come from the namenode in some way.

          Show
          Rodrigo Schmidt added a comment - Dhruba, what do you think of returning an extra null entry when the list is known to be incomplete? It looks a little hacky, but does the job quite well with minor code changes. We can't just test the size of the output provided at the client side because "dfs.corruptfilesreturned.max" is a server-side configuration and clients might have the wrong value. The hint must come from the namenode in some way.
          Hide
          Andrew Ryan added a comment -

          Overall this feature is a great thing to have. Running full fsck is not practical on a very large cluster, it takes a long time (> 1 hour) and uses a lot of resources on the namenode.

          Here's an example of the current output:
          """
          hadoop fsck / -list-corruptfiles
          Here are a few files that may be corrupted:
          ===========================================
          /tmp/hive-root/hive_2010-04-23_15-01-29_568_7146600693398653716/_tmp.10001/_tmp.attempt_201004162336_104941_r_000119_0

          There is at least 1 corrupt file under '/', which is CORRUPT
          """

          From an admin's point of view this message is bad for several reasons. "a few files that may be corrupted" is not very confident language
          1) It is not clear if this is the only file corrupted or if there are more.
          2) It is not clear if this file is or is not corrupted.
          3) It is not clear if there is any recourse, other than running 'fsck /', to get more details.

          Thanks!

          Show
          Andrew Ryan added a comment - Overall this feature is a great thing to have. Running full fsck is not practical on a very large cluster, it takes a long time (> 1 hour) and uses a lot of resources on the namenode. Here's an example of the current output: """ hadoop fsck / -list-corruptfiles Here are a few files that may be corrupted: =========================================== /tmp/hive-root/hive_2010-04-23_15-01-29_568_7146600693398653716/_tmp.10001/_tmp.attempt_201004162336_104941_r_000119_0 There is at least 1 corrupt file under '/', which is CORRUPT """ From an admin's point of view this message is bad for several reasons. "a few files that may be corrupted" is not very confident language 1) It is not clear if this is the only file corrupted or if there are more. 2) It is not clear if this file is or is not corrupted. 3) It is not clear if there is any recourse, other than running 'fsck /', to get more details. Thanks!
          Hide
          Rodrigo Schmidt added a comment -

          Andrew, I would propose the following to address your points:

          1) Make it clear whether the list is complete or there might be other corrupt files not listed.
          2) The "may be" must be taken out. It's known for sure that the listed files ARE corrupted at the moment of the call.
          3) Whenever the list is incomplete, the output should suggest running a complete fsck, which is the guaranteed way to list all the corrupt files.

          Is that fine with you?

          Show
          Rodrigo Schmidt added a comment - Andrew, I would propose the following to address your points: 1) Make it clear whether the list is complete or there might be other corrupt files not listed. 2) The "may be" must be taken out. It's known for sure that the listed files ARE corrupted at the moment of the call. 3) Whenever the list is incomplete, the output should suggest running a complete fsck, which is the guaranteed way to list all the corrupt files. Is that fine with you?
          Hide
          André Oriani added a comment -

          [ I started writing before Rodrigo's second comment]

          After resolving HDFS-1031 and having acquired more experience on the problem, I think:

          1) Saying "Here are a few files that may be corrupted:" is definitely wrong. The files are corrupted for sure. The doubt is whether the list is complete or not.
          2) The output is not sorted. A sorted output would make admin's life easier (new Jira?)
          3) fsck handles options in a very simple way. 'fsck / -move -delete" is accepted although move and delete options are mutually exclusive (new Jira?)
          4) Changes here shall be reflected on HDFS-1031 ( add a link to it?)
          5) @ Rodrigo : why not returning a struct comprised of the list and a flag to say if list is complete. I dunno if it is the best thing to do, but at least it will let things more intuitive

          Show
          André Oriani added a comment - [ I started writing before Rodrigo's second comment] After resolving HDFS-1031 and having acquired more experience on the problem, I think: 1) Saying "Here are a few files that may be corrupted:" is definitely wrong. The files are corrupted for sure. The doubt is whether the list is complete or not. 2) The output is not sorted. A sorted output would make admin's life easier (new Jira?) 3) fsck handles options in a very simple way. 'fsck / -move -delete" is accepted although move and delete options are mutually exclusive (new Jira?) 4) Changes here shall be reflected on HDFS-1031 ( add a link to it?) 5) @ Rodrigo : why not returning a struct comprised of the list and a flag to say if list is complete. I dunno if it is the best thing to do, but at least it will let things more intuitive
          Hide
          Rodrigo Schmidt added a comment -

          André, here is my take on your points:

          1) Agree!
          2) Don't think we need a new Jira just to sort the output. It could be done inside this one or some other JIRA we might create (it's just a sort call anyway).
          3) Definitely a new Jira
          4) Done!
          5) I wouldn't like to create yet another class that will be used only by this method. If we could do it in some simple and intuitive way, it would be better.

          Show
          Rodrigo Schmidt added a comment - André, here is my take on your points: 1) Agree! 2) Don't think we need a new Jira just to sort the output. It could be done inside this one or some other JIRA we might create (it's just a sort call anyway). 3) Definitely a new Jira 4) Done! 5) I wouldn't like to create yet another class that will be used only by this method. If we could do it in some simple and intuitive way, it would be better.
          Hide
          André Oriani added a comment -

          If it were C++ I would consider something like

          public:
          FileStatus[] Namenode::getCorruptFiles( bool& isComplete){
          ...
          }
          
          

          translating it to Java will be something like

          public FileStatus[] getCorruptFiles( boolean[] isComplete){
          ...
          }
          

          Okay, I agree . It is too much ugly. I don't even know if it will work with RPC.

          Show
          André Oriani added a comment - If it were C++ I would consider something like public : FileStatus[] Namenode::getCorruptFiles( bool& isComplete){ ... } translating it to Java will be something like public FileStatus[] getCorruptFiles( boolean [] isComplete){ ... } Okay, I agree . It is too much ugly. I don't even know if it will work with RPC.
          Hide
          Rodrigo Schmidt added a comment -

          I talked to Dhruba about this. Since we might want to extend this API in the future, it sounds better to create a new return class.

          @André: I thing using the class Boolean would work in centralized Java (no need to create the array). Anyway, the RPC mechanism passes parameters by value as far as I remember, which means we have to change the return class.

          Show
          Rodrigo Schmidt added a comment - I talked to Dhruba about this. Since we might want to extend this API in the future, it sounds better to create a new return class. @André: I thing using the class Boolean would work in centralized Java (no need to create the array). Anyway, the RPC mechanism passes parameters by value as far as I remember, which means we have to change the return class.
          Hide
          dhruba borthakur added a comment -

          Maybe we can have the following method:

          int getCorruptFiles(List<FileStatus> files) throws AccessControlException, IOException;

          The call passes in an empty list. The NameNode fills up this list with the FileStatus objects of the first 500 corrupt files. The return value of this method is the total number of corrupted files in the system.

          Show
          dhruba borthakur added a comment - Maybe we can have the following method: int getCorruptFiles(List<FileStatus> files) throws AccessControlException, IOException; The call passes in an empty list. The NameNode fills up this list with the FileStatus objects of the first 500 corrupt files. The return value of this method is the total number of corrupted files in the system.
          Hide
          Rodrigo Schmidt added a comment -

          getCorruptFiles() is currently an RPC and I think it's a good idea to have it like that, even though the Fsck usage doesn't require it (the RaidNode will eventually call this to recover files automatically). I haven't checked but I thought the RPC protocol we use passes parameters by value. That is, they are not returned back from the server. I imagine that passing parameters by copy (as you proposed) would make RPC calls unnecessarily expensive in the general case.

          Show
          Rodrigo Schmidt added a comment - getCorruptFiles() is currently an RPC and I think it's a good idea to have it like that, even though the Fsck usage doesn't require it (the RaidNode will eventually call this to recover files automatically). I haven't checked but I thought the RPC protocol we use passes parameters by value. That is, they are not returned back from the server. I imagine that passing parameters by copy (as you proposed) would make RPC calls unnecessarily expensive in the general case.
          Hide
          dhruba borthakur added a comment -

          Ok, got it, thanks.

          Show
          dhruba borthakur added a comment - Ok, got it, thanks.
          Hide
          Rodrigo Schmidt added a comment -

          Patch with all previous suggestions.

          Show
          Rodrigo Schmidt added a comment - Patch with all previous suggestions.
          Hide
          Rodrigo Schmidt added a comment -

          I don't know why Hudson is not picking up this patch.

          Show
          Rodrigo Schmidt added a comment - I don't know why Hudson is not picking up this patch.
          Hide
          Konstantin Shvachko added a comment -
          1. If you change ClientProtocol you should increment the protocol version.
          2. So many changes and new classes with the only outcome that fsck prints prints "ALL" or "A FEW" in the output instead of current "a few". I am curious when is it necessary to know whether thees are all corrupt files or there is more?
          Show
          Konstantin Shvachko added a comment - If you change ClientProtocol you should increment the protocol version. So many changes and new classes with the only outcome that fsck prints prints "ALL" or "A FEW" in the output instead of current "a few". I am curious when is it necessary to know whether thees are all corrupt files or there is more?
          Hide
          Rodrigo Schmidt added a comment -

          Thanks for the comments, Konstantin!

          You are right about 1. I'll change that.

          As for 2, those words make a big difference for ops people, specially if they are running fsck -list-corruptfiles on a subdirectory. Knowing that the list is empty because there are no corrupt files instead of maybe thinking it's empty because the list reported has more than the limit number of corrupt files in a different directory makes all the difference for them.

          Let me give you an example: some time ago we had a problem and many files got corrupted. We were using fsck -list-corruptfiles because it was faster and direct, but we wanted to focus on important directories first. Wee ran fsck -list-corruptfiles /path/to/important/dir but it returned an empty list. This was weird because we knew there were corrupt files there. The problem was that we filter the directory after we get the list reported from the namenode and the list is limited. For that reason, it was truncated with files in different directories and reported ambiguous output.

          Although the new code makes just a minor change to the interface, its meaning makes a huge impact to the user.

          Show
          Rodrigo Schmidt added a comment - Thanks for the comments, Konstantin! You are right about 1. I'll change that. As for 2, those words make a big difference for ops people, specially if they are running fsck -list-corruptfiles on a subdirectory. Knowing that the list is empty because there are no corrupt files instead of maybe thinking it's empty because the list reported has more than the limit number of corrupt files in a different directory makes all the difference for them. Let me give you an example: some time ago we had a problem and many files got corrupted. We were using fsck -list-corruptfiles because it was faster and direct, but we wanted to focus on important directories first. Wee ran fsck -list-corruptfiles /path/to/important/dir but it returned an empty list. This was weird because we knew there were corrupt files there. The problem was that we filter the directory after we get the list reported from the namenode and the list is limited. For that reason, it was truncated with files in different directories and reported ambiguous output. Although the new code makes just a minor change to the interface, its meaning makes a huge impact to the user.
          Hide
          Konstantin Shvachko added a comment -

          Thanks, the example is good. Now I start understanding the problem.

          > The problem was that we filter the directory after we get the list reported from the namenode

          Who filters out the results? I thought that /path/to/important/dir is passed to the name-node, so it should be looking for corrupt files only in this directory, and return as many corrupt files in it as the limit allows.

          Show
          Konstantin Shvachko added a comment - Thanks, the example is good. Now I start understanding the problem. > The problem was that we filter the directory after we get the list reported from the namenode Who filters out the results? I thought that /path/to/important/dir is passed to the name-node, so it should be looking for corrupt files only in this directory, and return as many corrupt files in it as the limit allows.
          Hide
          Rodrigo Schmidt added a comment -

          That's the standard fsck call.

          The -list-corruptfiles makes the getCorruptFiles() call to the namenode. This call eventually looks at the needed_replication queue, which doesn't know about directories, paths, or files. It picks up all blocks that are in the "missing" queue, finds their INodes, gets their paths, and truncates the output according to the limit to be returned. The filter is applied only after the getCorruptFiles() call returns.

          I think it's possible to change the API to pass the specific path you are interested in. I just created HDFS-1265 for that.

          However, the truncation problem addressed by this JIRA will remain. I think it's good for users to know whether the list is complete or now.

          Show
          Rodrigo Schmidt added a comment - That's the standard fsck call. The -list-corruptfiles makes the getCorruptFiles() call to the namenode. This call eventually looks at the needed_replication queue, which doesn't know about directories, paths, or files. It picks up all blocks that are in the "missing" queue, finds their INodes, gets their paths, and truncates the output according to the limit to be returned. The filter is applied only after the getCorruptFiles() call returns. I think it's possible to change the API to pass the specific path you are interested in. I just created HDFS-1265 for that. However, the truncation problem addressed by this JIRA will remain. I think it's good for users to know whether the list is complete or now.
          Hide
          Konstantin Shvachko added a comment -

          Rodrigo, do I understand correctly that in the case you describe the output will be empty list, because everything will be filtered out (same as bafore), and "A FEW" will indicate that more corrupt files are somwhere in the system? It does not seem very informative or user friendly to me. What do you think?

          > I just created HDFS-1265 for that.

          I think it is better to come to some general idea how things should work before creating jiras. May be you will decide to fix the problem here. BTW, do you plan to complete this new jira?

          Show
          Konstantin Shvachko added a comment - Rodrigo, do I understand correctly that in the case you describe the output will be empty list, because everything will be filtered out (same as bafore), and "A FEW" will indicate that more corrupt files are somwhere in the system? It does not seem very informative or user friendly to me. What do you think? > I just created HDFS-1265 for that. I think it is better to come to some general idea how things should work before creating jiras. May be you will decide to fix the problem here. BTW, do you plan to complete this new jira?
          Hide
          Rodrigo Schmidt added a comment -

          If the list is empty, we don't print the "ALL/A FEW" line:

                  if (matchedCorruptFilesCount == 1 ) {
                    out.println("Here are" + filler + "corrupted files:");
                    out.println("===========================================");
                  }
          

          If the list is incomplete (empty or not), we print out a header alerting the user:

              if (!corruptFileStatuses.isComplete()) {
                filler = " A FEW ";
                out.println("\n\nATTENTION: List of corrupted files returned from" +
                            " namenode was INCOMPLETE.\n\n");
              }
          

          I thought that would be enough but I'm open to other options.
          What else would you like to add?

          As for creating JIRAs, I guess was just trying to be a little proactive, but maybe it was wrong. My rationale was the following:
          1) These are orthogonal problems (incomplete lists, and server-side filtering)
          2) The current patch for this JIRA is already long and complicated. Extending it would increase the chances of introducing bugs.
          3) Blocking one change should not necessarily block the other, thus calling for a separate JIRA.

          I assigned HDFS-1265 to me because I've been dealing with the getCorruptFiles() API since its creation. I assumed I would be probably the one working on it anyway, though I don't plan to do this in the next few days. If you think it should be left unassigned or deleted, I don't mind.

          Show
          Rodrigo Schmidt added a comment - If the list is empty, we don't print the "ALL/A FEW" line: if (matchedCorruptFilesCount == 1 ) { out.println( "Here are" + filler + "corrupted files:" ); out.println( "===========================================" ); } If the list is incomplete (empty or not), we print out a header alerting the user: if (!corruptFileStatuses.isComplete()) { filler = " A FEW " ; out.println( "\n\nATTENTION: List of corrupted files returned from" + " namenode was INCOMPLETE.\n\n" ); } I thought that would be enough but I'm open to other options. What else would you like to add? As for creating JIRAs, I guess was just trying to be a little proactive, but maybe it was wrong. My rationale was the following: 1) These are orthogonal problems (incomplete lists, and server-side filtering) 2) The current patch for this JIRA is already long and complicated. Extending it would increase the chances of introducing bugs. 3) Blocking one change should not necessarily block the other, thus calling for a separate JIRA. I assigned HDFS-1265 to me because I've been dealing with the getCorruptFiles() API since its creation. I assumed I would be probably the one working on it anyway, though I don't plan to do this in the next few days. If you think it should be left unassigned or deleted, I don't mind.
          Hide
          Konstantin Shvachko added a comment -

          I was addressing the example you gave.
          > Wee ran fsck -list-corruptfiles /path/to/important/dir but it returned an empty list.
          > The problem was that we filter the directory after we get the list reported from the namenode and the list is limited.

          What do you print if the list is empty, but incomplete? Looks like it is going to be only the message
          ATTENTION: List of corrupted files returned from namenode was INCOMPLETE.
          and no list. I think this is confusing.
          And the only one way to get it done is to pass the directory path to getCorruptFiles("/path/to/important/dir").

          I browsed through the issues dedicated to the -list-corruptfiles option for fsck.

          • First of all, I think it should have been done in one issue with a proper design of the new feature and all UI / API issues thought through in advance, rather than doing it gradually. I sure don't know whether it could have been done that way, but it seems more convenient to discuss everything in one place rather than jumping all over around.
          • Second of all, as a result of that (I believe) there was introduced an unnecessary ClientProtocol method: getCorruptFiles(), which is being modified here also unnecessary.

          ClientProtocol changes are not necessary because fsck is works over http rather than via rpc. NamenodeFsck - a part of FsckServlet calls name-node methods directly, rather than through rpc. Therefore, ClientProtocol has nothing to do with this. For example, getBlockLocationsNoATime() is not in ClientProtocol. The same should be with getCorruptFiles().

          So I propose to remove getCorruptFiles() from ClientProtocol in this jira instead of modifying it...
          And then I won't argue about printout messages anymore...

          I believe you will still need HDFS-1265 to deal with your example correctly.

          Show
          Konstantin Shvachko added a comment - I was addressing the example you gave. > Wee ran fsck -list-corruptfiles /path/to/important/dir but it returned an empty list. > The problem was that we filter the directory after we get the list reported from the namenode and the list is limited. What do you print if the list is empty, but incomplete? Looks like it is going to be only the message ATTENTION: List of corrupted files returned from namenode was INCOMPLETE. and no list. I think this is confusing. And the only one way to get it done is to pass the directory path to getCorruptFiles("/path/to/important/dir") . I browsed through the issues dedicated to the -list-corruptfiles option for fsck. First of all, I think it should have been done in one issue with a proper design of the new feature and all UI / API issues thought through in advance, rather than doing it gradually. I sure don't know whether it could have been done that way, but it seems more convenient to discuss everything in one place rather than jumping all over around. Second of all, as a result of that (I believe) there was introduced an unnecessary ClientProtocol method: getCorruptFiles() , which is being modified here also unnecessary. ClientProtocol changes are not necessary because fsck is works over http rather than via rpc. NamenodeFsck - a part of FsckServlet calls name-node methods directly, rather than through rpc. Therefore, ClientProtocol has nothing to do with this. For example, getBlockLocationsNoATime() is not in ClientProtocol . The same should be with getCorruptFiles() . So I propose to remove getCorruptFiles() from ClientProtocol in this jira instead of modifying it... And then I won't argue about printout messages anymore... I believe you will still need HDFS-1265 to deal with your example correctly.
          Hide
          Rodrigo Schmidt added a comment -

          If I remember correctly, there is another message when the list is empty telling that no corrupt files could have been found. That part of the code was not changed by this patch.

          The -list-corruptfiles option on fsck emerged after a number of incremental changes, in the very organic way open source works.

          The reason why getCorruptFiles() belongs to ClientProtocol is to allow external services to query it. We are working on having the RaidNode use this call to automatically recover corrupted blocks. I guess this shows we did think some things through.

          On a design related note, if we want to build a truly distributed namenode, we should be thinking about taking things out of it, like fsck itself.

          Last, HDFS-1265 seems to be an orthogonal issue, since it doesn't guarantee the list is complete. Adding filters DOES NOT prevent the queried directory from having thousands of corrupted files, out of which, only a few hundred will be reported.

          Show
          Rodrigo Schmidt added a comment - If I remember correctly, there is another message when the list is empty telling that no corrupt files could have been found. That part of the code was not changed by this patch. The -list-corruptfiles option on fsck emerged after a number of incremental changes, in the very organic way open source works. The reason why getCorruptFiles() belongs to ClientProtocol is to allow external services to query it. We are working on having the RaidNode use this call to automatically recover corrupted blocks. I guess this shows we did think some things through. On a design related note, if we want to build a truly distributed namenode, we should be thinking about taking things out of it, like fsck itself. Last, HDFS-1265 seems to be an orthogonal issue, since it doesn't guarantee the list is complete. Adding filters DOES NOT prevent the queried directory from having thousands of corrupted files, out of which, only a few hundred will be reported.
          Hide
          Konstantin Shvachko added a comment -

          > If I remember correctly, there is another message
          So what exactly will it print under the conditions (of your example) above?

          I see that getCorruptFiles() was introduced in HDFS-729 as a part of fsck changes related to -list-corruptfiles. And it is not necessary for fsck. Do you agree with that?

          RaidNode has not been mentioned before, not here, not in HDFS-729. ClientProtocol is HDFS-private api according to recently adopted classification. This means that external tools should not use or depend on it.

          Show
          Konstantin Shvachko added a comment - > If I remember correctly, there is another message So what exactly will it print under the conditions (of your example) above? I see that getCorruptFiles() was introduced in HDFS-729 as a part of fsck changes related to -list-corruptfiles. And it is not necessary for fsck. Do you agree with that? RaidNode has not been mentioned before, not here, not in HDFS-729 . ClientProtocol is HDFS-private api according to recently adopted classification. This means that external tools should not use or depend on it.
          Hide
          Rodrigo Schmidt added a comment -

          The RaidNode extension is not a reason, it's an example of tool that could benefit from that design.

          The meaning of "external" is context-dependent. I meant "external to the namenode", you assumed "external to HDFS".

          I would like to understand better the motivation behind this discussion before proceeding. I wouldn't like to transform this into a long thread about details and history behind HDFS-729.

          Show
          Rodrigo Schmidt added a comment - The RaidNode extension is not a reason, it's an example of tool that could benefit from that design. The meaning of "external" is context-dependent. I meant "external to the namenode", you assumed "external to HDFS". I would like to understand better the motivation behind this discussion before proceeding. I wouldn't like to transform this into a long thread about details and history behind HDFS-729 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446698/HADFS-1111.0.patch
          against trunk revision 957669.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446698/HADFS-1111.0.patch against trunk revision 957669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/194/console This message is automatically generated.
          Hide
          Sanjay Radia added a comment -

          Q. Is the RaidNode accessing the functionality via RPC directly or via method that was added to the Hdfs and DistributedFileSystem.
          RaidNode should not be accessing the functionality directly via RPC - rpc's are internal interfaces.
          Further if you believe this functionality is useful for adding to Hdfs and DistributedFilesystem please make the case ( i believe one could make such a case).
          When adding special hooks for private or external tools, one should make a case that such hooks are generally useful.
          I realize that a previous Jira has added this functionality; but you were the author of the previous jira and so should be able to make the case.

          Show
          Sanjay Radia added a comment - Q. Is the RaidNode accessing the functionality via RPC directly or via method that was added to the Hdfs and DistributedFileSystem. RaidNode should not be accessing the functionality directly via RPC - rpc's are internal interfaces. Further if you believe this functionality is useful for adding to Hdfs and DistributedFilesystem please make the case ( i believe one could make such a case). When adding special hooks for private or external tools, one should make a case that such hooks are generally useful. I realize that a previous Jira has added this functionality; but you were the author of the previous jira and so should be able to make the case.
          Hide
          Rodrigo Schmidt added a comment -

          The RaidNode is not currently using this API, although its use was one of the motivations I had for adding getCorruptFiles() to ClientProtocol. Originally, raid was part of HDFS and I could certainly see how Raid (and possibly other parts of HDFS) could benefit from this as an RPC to the namenode. I thought the others saw it too because when I got to HDFS-729, having getCorruptFiles() on ClientProtocol was not under discussion anymore.

          The JIRA that is responsible for making the RaidNode call getCorruptFiles is HDFS-1171. Most probably we will have to extend DistributedFileSystem to export getCorruptFiles(). That's why I said we don't have to be external to HDFS, but we can be external to the namenode.

          On the other hand, if we do take getCorruptFiles() out of ClientProtocol, we will make HDFS-1171 overly complicated or expensive.

          I really think the correct design choice is to export basic APIs like getCorruptFiles() as RPCs and build services like fsck and raid completely outside the namenode. After looking at the fsck code from the inside out and having experienced how it can sometimes compromise the whole filesystem because the namenode is using most of its resources to calculate outputs for fsck requests, I'm convinced it should be outside the namenode. For the sake of horizontal scalability of the namenode, we should be working in redesigning things like the current fsck implementation, instead of reinforcing it.

          That's what I meant when I said we should be taking things out of the namenode. In my opinion, even if my case about having other parts of HDFS call getCorruptFiles() is not convincing, taking it out of ClientProtocol only reinforces the design choice of running fsck inside the namenode, which I think is bad. As we have more and more discussions about a distributed namenode, things like fsck should be the first ones running externally to it (to the namenode, not to HDFS). I see this as a low-hanging fruit towards a more scalable and distributed namenode.

          Show
          Rodrigo Schmidt added a comment - The RaidNode is not currently using this API, although its use was one of the motivations I had for adding getCorruptFiles() to ClientProtocol. Originally, raid was part of HDFS and I could certainly see how Raid (and possibly other parts of HDFS) could benefit from this as an RPC to the namenode. I thought the others saw it too because when I got to HDFS-729 , having getCorruptFiles() on ClientProtocol was not under discussion anymore. The JIRA that is responsible for making the RaidNode call getCorruptFiles is HDFS-1171 . Most probably we will have to extend DistributedFileSystem to export getCorruptFiles(). That's why I said we don't have to be external to HDFS, but we can be external to the namenode. On the other hand, if we do take getCorruptFiles() out of ClientProtocol, we will make HDFS-1171 overly complicated or expensive. I really think the correct design choice is to export basic APIs like getCorruptFiles() as RPCs and build services like fsck and raid completely outside the namenode. After looking at the fsck code from the inside out and having experienced how it can sometimes compromise the whole filesystem because the namenode is using most of its resources to calculate outputs for fsck requests, I'm convinced it should be outside the namenode. For the sake of horizontal scalability of the namenode, we should be working in redesigning things like the current fsck implementation, instead of reinforcing it. That's what I meant when I said we should be taking things out of the namenode. In my opinion, even if my case about having other parts of HDFS call getCorruptFiles() is not convincing, taking it out of ClientProtocol only reinforces the design choice of running fsck inside the namenode, which I think is bad. As we have more and more discussions about a distributed namenode, things like fsck should be the first ones running externally to it (to the namenode, not to HDFS). I see this as a low-hanging fruit towards a more scalable and distributed namenode.
          Hide
          Sanjay Radia added a comment -

          >I really think the correct design choice is to export basic APIs like getCorruptFiles() as RPCs.
          I suspect you have a misunderstanding of how the client side connects via RPC.
          We have no plans to expose the RPCs directly for now.
          In order to allow tools to access such functionality it is not necessary to use the RPC directly; Hdfs and DistributedFileSystem (which extend AbstractFileSystem and FileSystem) are effectively the client side library to access a NN.
          >On the other hand, if we do take getCorruptFiles() out of ClientProtocol, we will make HDFS-1171 overly complicated or expensive.
          Not if you add the method to Hdfs and DistributedFileSystem.
          You simply need to make the case for adding getCorruptFIles to these two classes. It appears that this functionality got slipped in as part of HDFS-1171.

          Show
          Sanjay Radia added a comment - >I really think the correct design choice is to export basic APIs like getCorruptFiles() as RPCs. I suspect you have a misunderstanding of how the client side connects via RPC. We have no plans to expose the RPCs directly for now. In order to allow tools to access such functionality it is not necessary to use the RPC directly; Hdfs and DistributedFileSystem (which extend AbstractFileSystem and FileSystem) are effectively the client side library to access a NN. >On the other hand, if we do take getCorruptFiles() out of ClientProtocol, we will make HDFS-1171 overly complicated or expensive. Not if you add the method to Hdfs and DistributedFileSystem. You simply need to make the case for adding getCorruptFIles to these two classes. It appears that this functionality got slipped in as part of HDFS-1171 .
          Hide
          Rodrigo Schmidt added a comment -

          I thought DistributedFileSystem and Hdfs classes contact the namenode via RPC, using ClientProtocol. Maybe I'm missing something but I think that even if we change Hdfs and DistributedFileSystem, getCorruptFiles() will have to be part of ClientProtocol.

          Show
          Rodrigo Schmidt added a comment - I thought DistributedFileSystem and Hdfs classes contact the namenode via RPC, using ClientProtocol. Maybe I'm missing something but I think that even if we change Hdfs and DistributedFileSystem, getCorruptFiles() will have to be part of ClientProtocol.
          Hide
          Rodrigo Schmidt added a comment -

          Sanjay, Konstantin: Did I address all your questions? Can we move on with this patch?

          Show
          Rodrigo Schmidt added a comment - Sanjay, Konstantin: Did I address all your questions? Can we move on with this patch?
          Hide
          Konstantin Shvachko added a comment -

          There are several things that went wrong with HDFS-729. And I think this jira should take care of them.

          1. API for getting corrupt files is incomplete.
            • There is no way to tell whether corrupt files returned by fsck are all corrupt files or there is more. This jira is directly addressing the problem.
            • If there is more, there is no way to receive the exhaustive list of corrupt files by repetitive calling getCorruptFiles(). This is the next question people ask in regard to this feature.
            • Filtering over a sub-tree. The subject of HDFS-1265.
          2. Synchronization problem. HDFS-729 introduced UnderReplicatedBlocks.getQueue(level), which returns a pointer to an internal queue, which opens way for unsynchronized usage of this collection.
          3. Unnecessary ClientProtocol changes. I still prefer to think that the changes were just a mistake rather than an attempt to silently squeeze in some changes for external tools. One way or an other the wire protocol changes should have an important use case. Right now you did not make a clear case for that, as Sanjay pointed out.
          Show
          Konstantin Shvachko added a comment - There are several things that went wrong with HDFS-729 . And I think this jira should take care of them. API for getting corrupt files is incomplete. There is no way to tell whether corrupt files returned by fsck are all corrupt files or there is more. This jira is directly addressing the problem. If there is more, there is no way to receive the exhaustive list of corrupt files by repetitive calling getCorruptFiles(). This is the next question people ask in regard to this feature. Filtering over a sub-tree. The subject of HDFS-1265 . Synchronization problem. HDFS-729 introduced UnderReplicatedBlocks.getQueue(level) , which returns a pointer to an internal queue, which opens way for unsynchronized usage of this collection. Unnecessary ClientProtocol changes. I still prefer to think that the changes were just a mistake rather than an attempt to silently squeeze in some changes for external tools. One way or an other the wire protocol changes should have an important use case. Right now you did not make a clear case for that, as Sanjay pointed out.
          Hide
          Sriram Rao added a comment -

          After further discussions with Konstantin, a way to address this problem with the API is to have the client provide the starting block id:
          1. As long as there are corrupt blocks, have fsck return back pairs of the form <block id>, <pathname>
          2. In subsequent call, the client returns back the last corrupt block id; fsck then uses that block id as the starting point for the next list
          3. This process iterates until there are no more corrupt blocks; at which point, fsck returns back "There are no more corrupt blocks"

          This is similar in spirit to getListing().

          I'll provide a patch which also addresses the synchronization problem that Konstantin is referring to.

          Show
          Sriram Rao added a comment - After further discussions with Konstantin, a way to address this problem with the API is to have the client provide the starting block id: 1. As long as there are corrupt blocks, have fsck return back pairs of the form <block id>, <pathname> 2. In subsequent call, the client returns back the last corrupt block id; fsck then uses that block id as the starting point for the next list 3. This process iterates until there are no more corrupt blocks; at which point, fsck returns back "There are no more corrupt blocks" This is similar in spirit to getListing(). I'll provide a patch which also addresses the synchronization problem that Konstantin is referring to.
          Hide
          Rodrigo Schmidt added a comment -

          Getting a continuous flow of corrupted blocks is a hard problem because these data structures can change in between calls. Part of the long discussion on HDFS-729 was about that.

          I still think that removing the ClientProtocol changes is a step backwards. Instead of taking it out, it is better to add it to the Hdfs or DistributedFileSystem API and allow other services to query it. The best example I have of a service that could benefit from it is HDFS-RAID. RAID is in an very advanced stage now. Ram and Scott have just implemented Reed-Solomon codes. Not having an API to query corrupted blocks directly makes simple things very difficult.

          I also don't think it's a good idea to solve different and orthogonal issues on the same JIRA. I think JIRAs should be as small and simple as possible to make discussions and reviews easier.

          Show
          Rodrigo Schmidt added a comment - Getting a continuous flow of corrupted blocks is a hard problem because these data structures can change in between calls. Part of the long discussion on HDFS-729 was about that. I still think that removing the ClientProtocol changes is a step backwards. Instead of taking it out, it is better to add it to the Hdfs or DistributedFileSystem API and allow other services to query it. The best example I have of a service that could benefit from it is HDFS-RAID. RAID is in an very advanced stage now. Ram and Scott have just implemented Reed-Solomon codes. Not having an API to query corrupted blocks directly makes simple things very difficult. I also don't think it's a good idea to solve different and orthogonal issues on the same JIRA. I think JIRAs should be as small and simple as possible to make discussions and reviews easier.
          Hide
          Konstantin Shvachko added a comment -

          I see there is a reference to my participation in HDFS-729, so there is nobody to blame but myself.

          I think the lesson with list directories taught us some things. And it has the same issue: we do not guarantee that we list all directory entries as a single snapshot, because there could be too many of them. We only guarantee to return the current consequent list of N entries following the specified name. The rest may have changed by the time the list of N is displayed.

          With Sriram's approach we actually list blocks of corrupted files and provide info about files they belong to. This is different from the previously discussed approach.

          • So I propose to rename the method and the respective fsck option to listCorruptFileBlocks instead of listCorruptFile.

          The paging in Sriram's proposal is done by blockId. Since the blocks in the UnderReplicatedBlocks queues are ordered by blockId this will provide more natural paging semantics than "skip K and return the next N" - one of the variants considered before. Paging by blockId is in a sense the same as in list dirs. Fsck guarantees to return a consequent list of N corrupt blocks greater than the given id.

          ClientProtocol changes. My point is that any new features included in the code need to be supported, which is not free. And supporting a feature which is not used by anybody is particularly inefficient and even frustrating, not that we don't have any of such already.
          RAID may be a good use case for this API, but I agree with Rodrigo it's a topic of different discussion and we should take it out of this issue. I sure do not have enough context, but may be RAID can query NN for corrupt blocks the same way fsck does.

          Show
          Konstantin Shvachko added a comment - I see there is a reference to my participation in HDFS-729 , so there is nobody to blame but myself. I think the lesson with list directories taught us some things. And it has the same issue: we do not guarantee that we list all directory entries as a single snapshot, because there could be too many of them. We only guarantee to return the current consequent list of N entries following the specified name. The rest may have changed by the time the list of N is displayed. With Sriram's approach we actually list blocks of corrupted files and provide info about files they belong to. This is different from the previously discussed approach. So I propose to rename the method and the respective fsck option to listCorruptFileBlocks instead of listCorruptFile . The paging in Sriram's proposal is done by blockId. Since the blocks in the UnderReplicatedBlocks queues are ordered by blockId this will provide more natural paging semantics than "skip K and return the next N" - one of the variants considered before. Paging by blockId is in a sense the same as in list dirs. Fsck guarantees to return a consequent list of N corrupt blocks greater than the given id. ClientProtocol changes. My point is that any new features included in the code need to be supported, which is not free. And supporting a feature which is not used by anybody is particularly inefficient and even frustrating, not that we don't have any of such already. RAID may be a good use case for this API, but I agree with Rodrigo it's a topic of different discussion and we should take it out of this issue. I sure do not have enough context, but may be RAID can query NN for corrupt blocks the same way fsck does.
          Hide
          Rodrigo Schmidt added a comment -

          One other thought I have about paging is that it might introduce unnecessary complexity. The number of corrupted files is usually low. If it's too high, it might be better to run a full fsck. But if there is a strong case for paging, I'm fine with it.

          ClientProtocol vs. jsp: As I mentioned before, I'm opposed to the fsck strategy because it increases the load on the namenode. I've seen a complete cluster with thousands of nodes almost go down because there were parallel executions of fsck running internally to the namenode and they couldn't be stopped. Besides that, using HTTP to get data from the namenode is just another way to implement an RPC. The advantage of JSP is that it allows for longer or more dynamic outputs, which is not the case here. I'm fine moving this specific topic to another JIRA or discussion list.

          Show
          Rodrigo Schmidt added a comment - One other thought I have about paging is that it might introduce unnecessary complexity. The number of corrupted files is usually low. If it's too high, it might be better to run a full fsck. But if there is a strong case for paging, I'm fine with it. ClientProtocol vs. jsp: As I mentioned before, I'm opposed to the fsck strategy because it increases the load on the namenode. I've seen a complete cluster with thousands of nodes almost go down because there were parallel executions of fsck running internally to the namenode and they couldn't be stopped. Besides that, using HTTP to get data from the namenode is just another way to implement an RPC. The advantage of JSP is that it allows for longer or more dynamic outputs, which is not the case here. I'm fine moving this specific topic to another JIRA or discussion list.
          Hide
          Sriram Rao added a comment -

          The case for paging was made by you in one of the JIRAs on this issue. You went looking for list of files in an important dir and found that the 500 limit was getting in the way.

          The patch that you have done has the namenode doing the filtering (and this has caused problems).

          What we are proposing instead, is to have the namenode return a list of corrupt files to the client and then let the client do the filtering. The way we envision using this feature is via an iterative approach to fixing corruption:
          1. get a list of corrupt files for a certain path
          2. fix up the corrupt files in that path
          3. iterate; stop if the list of corrupt files is empty

          By being iterative, this proposal also addresses one of the issues you had brought up: namely, the list of corrupt files can change between successive paging calls.

          Fsck is a fall-back. With PBs that we have in our clusters, a full Fsck does take a few hours to finish.

          Show
          Sriram Rao added a comment - The case for paging was made by you in one of the JIRAs on this issue. You went looking for list of files in an important dir and found that the 500 limit was getting in the way. The patch that you have done has the namenode doing the filtering (and this has caused problems). What we are proposing instead, is to have the namenode return a list of corrupt files to the client and then let the client do the filtering. The way we envision using this feature is via an iterative approach to fixing corruption: 1. get a list of corrupt files for a certain path 2. fix up the corrupt files in that path 3. iterate; stop if the list of corrupt files is empty By being iterative, this proposal also addresses one of the issues you had brought up: namely, the list of corrupt files can change between successive paging calls. Fsck is a fall-back. With PBs that we have in our clusters, a full Fsck does take a few hours to finish.
          Hide
          Sriram Rao added a comment -

          Patch that fixes the issues with listCorruptFilesBlocks(). Fixes bug with the previous patch; add support for "paging", where the client can callback to get a list of corrupt files/blocks since the previous call.

          This patch is for Yahoo's version of Hadoop-20.

          Show
          Sriram Rao added a comment - Patch that fixes the issues with listCorruptFilesBlocks(). Fixes bug with the previous patch; add support for "paging", where the client can callback to get a list of corrupt files/blocks since the previous call. This patch is for Yahoo's version of Hadoop-20.
          Hide
          Konstantin Shvachko added a comment -

          The patch looks good. A couple of nits.

          1. I don't think we should have new a configuration variable for the number of corrupt blocks to return.
            We should just use the constant you introduced. I like the name MAX_CORRUPT_FILE_BLOCKS_RETURNED.
            The value is set to 100, which is fine with me. Please speak up if somebody has other values in mind.
            So variable maxListCorruptFilesBlocksReturned will not be necessary.
          2. Consulted with Rob about the name listCorruptFilesBlocks(). Plural Files in the name doesn't sound right,
            as it is not clear whether we return files or blocks. Would be good to change it to listCorruptFileBlocks() throughout the code.
          3. In NamenodeFsck.listCorruptFilesBlocks() the printout is not correct. It will read one of the following:
            "The filesystem under path '/tmp' has 57 is CORRUPT files"
            "The filesystem under path '/tmp' has no is CORRUPT files"
            

            This should be rephrased. Also the last message can be confusing if you already returned some files before.
            We should probably distinguish and say "has no CORRUPT files" if startBlockAfter == null, and
            "has no more CORRUPT files" otherwise.

          Show
          Konstantin Shvachko added a comment - The patch looks good. A couple of nits. I don't think we should have new a configuration variable for the number of corrupt blocks to return. We should just use the constant you introduced. I like the name MAX_CORRUPT_FILE_BLOCKS_RETURNED . The value is set to 100, which is fine with me. Please speak up if somebody has other values in mind. So variable maxListCorruptFilesBlocksReturned will not be necessary. Consulted with Rob about the name listCorruptFilesBlocks() . Plural Files in the name doesn't sound right, as it is not clear whether we return files or blocks. Would be good to change it to listCorruptFileBlocks() throughout the code. In NamenodeFsck.listCorruptFilesBlocks() the printout is not correct. It will read one of the following: "The filesystem under path '/tmp' has 57 is CORRUPT files" "The filesystem under path '/tmp' has no is CORRUPT files" This should be rephrased. Also the last message can be confusing if you already returned some files before. We should probably distinguish and say "has no CORRUPT files" if startBlockAfter == null, and "has no more CORRUPT files" otherwise.
          Hide
          Sriram Rao added a comment -

          @Konstantin:

          I'll fix most of the nits you have pointed out apart from this one: maxListCorruptFilesBlocksReturned
          Having this variable simplifies testing.

          Show
          Sriram Rao added a comment - @Konstantin: I'll fix most of the nits you have pointed out apart from this one: maxListCorruptFilesBlocksReturned Having this variable simplifies testing.
          Hide
          Sriram Rao added a comment -

          Attached is a patch that addresses the issues pointed to by Konstantin.

          This patch also includes the CLI change (the change to DFsck) to allow a user to list-corruptfileblocks <path> and get back the full list of files with the missing block info.

          This patch is for Yahoo Hadoop-20.

          Show
          Sriram Rao added a comment - Attached is a patch that addresses the issues pointed to by Konstantin. This patch also includes the CLI change (the change to DFsck) to allow a user to list-corruptfileblocks <path> and get back the full list of files with the missing block info. This patch is for Yahoo Hadoop-20.
          Hide
          Sriram Rao added a comment -

          Patch attached for Yahoo Hadoop-20 (that addresses Konstantin's comments and the associated CLI changes).

          Show
          Sriram Rao added a comment - Patch attached for Yahoo Hadoop-20 (that addresses Konstantin's comments and the associated CLI changes).
          Hide
          Sriram Rao added a comment -

          Patch described in my last update is attached.

          This patch is for Yahoo Hadoop-20.

          Show
          Sriram Rao added a comment - Patch described in my last update is attached. This patch is for Yahoo Hadoop-20.
          Hide
          Konstantin Shvachko added a comment -

          +1 on the last y20 patch.

          Sriram told me he is working on a ptach for trunk.
          As I proposed earlier it would be good to remove unnecessary ClientProtocol method getCorruptFiles() as it is not used anywhere, and it would be a waste of resources to support it. It is a public method, so if people think we need to vote on this, please raise the question.

          Show
          Konstantin Shvachko added a comment - +1 on the last y20 patch. Sriram told me he is working on a ptach for trunk. As I proposed earlier it would be good to remove unnecessary ClientProtocol method getCorruptFiles() as it is not used anywhere, and it would be a waste of resources to support it. It is a public method, so if people think we need to vote on this, please raise the question.
          Hide
          dhruba borthakur added a comment -

          There is an application (RaidNode) that needs to detect missing blocks and fix them as soon as possible. The design is such that it should find missing blocks within minutes (rather than hours).

          1. The getCorruptFiles() method satisfies this goal perfectly. This can be enhanced to add a method in DistrbutedFileSystem to expose this API in a more formal fashion.

          2. another alternative would be to remove NameNode.getCorruptFiles() and add it to fsck.

          3. An alternative would be to introduce a callback mechanism to be registered with the NN, this callback is invoked when the NN detects missing blocks.

          I prefer approach 1 because it is more direct and consumes less resources than option 2. Option 3 is very heavyweight.

          Show
          dhruba borthakur added a comment - There is an application (RaidNode) that needs to detect missing blocks and fix them as soon as possible. The design is such that it should find missing blocks within minutes (rather than hours). 1. The getCorruptFiles() method satisfies this goal perfectly. This can be enhanced to add a method in DistrbutedFileSystem to expose this API in a more formal fashion. 2. another alternative would be to remove NameNode.getCorruptFiles() and add it to fsck. 3. An alternative would be to introduce a callback mechanism to be registered with the NN, this callback is invoked when the NN detects missing blocks. I prefer approach 1 because it is more direct and consumes less resources than option 2. Option 3 is very heavyweight.
          Hide
          Jakob Homan added a comment -

          Canceling patch since there is no trunk patch available for review and commit.

          Show
          Jakob Homan added a comment - Canceling patch since there is no trunk patch available for review and commit.
          Hide
          Sriram Rao added a comment -

          Attached is a patch for trunk: add new API -list-corruptFileBlocks that provides the list of pairs of the form "<blockid>\t<pathname>".

          Show
          Sriram Rao added a comment - Attached is a patch for trunk: add new API -list-corruptFileBlocks that provides the list of pairs of the form "<blockid>\t<pathname>".
          Hide
          Konstantin Shvachko added a comment -

          The patch review comments.

          FSNamesystem

          1. CorruptFileBlockInfo should be a static non-public class.
          2. I'd rather make it store the the file path and the block itself, rather than it's name.
            The block name can be constructed in Fsck during printing.
            static class CorruptFileBlockInfo {
              String path;
              Block block;
            
              public String toString() {
                return block.getBlockName() + "\t" + path;
              }
            }
            

            The method may be used in other tools, as Dhruba stated, so it is better to keep the original structures.

          3. Methods listCorruptFileBlocks() should return Collection<CorruptFileBlockInfo> rather than CorruptFileBlockInfo[]. Saves conversion to an array.
          4. NamenodeFsck unused imports: FileStatus, Path
          5. TestFileCorruption unused imports: FileStatus, ClientProtocol
          6. DFSck unused imports: DFSConfigKeys
          Show
          Konstantin Shvachko added a comment - The patch review comments. FSNamesystem CorruptFileBlockInfo should be a static non-public class. I'd rather make it store the the file path and the block itself, rather than it's name. The block name can be constructed in Fsck during printing. static class CorruptFileBlockInfo { String path; Block block; public String toString() { return block.getBlockName() + "\t" + path; } } The method may be used in other tools, as Dhruba stated, so it is better to keep the original structures. Methods listCorruptFileBlocks() should return Collection<CorruptFileBlockInfo> rather than CorruptFileBlockInfo[] . Saves conversion to an array. NamenodeFsck unused imports: FileStatus, Path TestFileCorruption unused imports: FileStatus, ClientProtocol DFSck unused imports: DFSConfigKeys
          Hide
          Konstantin Shvachko added a comment -

          Dhruba. I'd be glad to discuss how to make corrupted blocks available to RaidNode in a subsequent jira.
          Currently the method in ClientProtocl lacks the functionality stated in this jira, and does not conform to the design proposed by Sriram. So it has to be replaced by another method or removed. As it is not used anywhere in the code I'd prefer that it is removed from the protocol until the exact api is developed. It looks like this effort should be coordinated with MAPREDUCE-2036 so that both RAID solutions could benefit from it.
          Would you agree?

          Show
          Konstantin Shvachko added a comment - Dhruba. I'd be glad to discuss how to make corrupted blocks available to RaidNode in a subsequent jira. Currently the method in ClientProtocl lacks the functionality stated in this jira, and does not conform to the design proposed by Sriram. So it has to be replaced by another method or removed. As it is not used anywhere in the code I'd prefer that it is removed from the protocol until the exact api is developed. It looks like this effort should be coordinated with MAPREDUCE-2036 so that both RAID solutions could benefit from it. Would you agree?
          Hide
          dhruba borthakur added a comment -

          The patch posted by Sriram look good, however I would add the new API listCorruptFilesAndBlocks to ClientProtocol so that tools can use it.

          Show
          dhruba borthakur added a comment - The patch posted by Sriram look good, however I would add the new API listCorruptFilesAndBlocks to ClientProtocol so that tools can use it.
          Hide
          Konstantin Shvachko added a comment -

          Tools cannot use ClientProtocol directly as discussed above. More changes will be required to make it usable. Can it be done in another jira?

          Show
          Konstantin Shvachko added a comment - Tools cannot use ClientProtocol directly as discussed above. More changes will be required to make it usable. Can it be done in another jira?
          Hide
          dhruba borthakur added a comment -

          > More changes will be required to make it usable

          what are those? can you pl list them (again), thanks.

          Show
          dhruba borthakur added a comment - > More changes will be required to make it usable what are those? can you pl list them (again), thanks.
          Hide
          Konstantin Shvachko added a comment -

          This 2 comments by Sanjay and me should summarize the discussion about ClientProtocol changes.
          ClientProtocol, DFSClient, DistributedFileSystem, FileSystem, FileContect... - this classes may be affected by the changes.
          The main problem is that

          • there is no clear idea / description of how this api will be exposed to RaidNode,
          • therefore, it is not clear what is the scope of the changes,
          • the changes intended for RaidNode are not related to Fsck, and shouldn't be mixed in with this functionality.

          You say (plural) tools, do you have other than RaidNode tools in mind?

          Show
          Konstantin Shvachko added a comment - This 2 comments by Sanjay and me should summarize the discussion about ClientProtocol changes. ClientProtocol, DFSClient, DistributedFileSystem, FileSystem, FileContect... - this classes may be affected by the changes. The main problem is that there is no clear idea / description of how this api will be exposed to RaidNode, therefore, it is not clear what is the scope of the changes, the changes intended for RaidNode are not related to Fsck, and shouldn't be mixed in with this functionality. You say (plural) tools, do you have other than RaidNode tools in mind?
          Hide
          Ramkumar Vadali added a comment -

          The RaidNode use case at a high level is to identify corrupted data that can be fixed by using parity data.

          This can be achieved by:
          1. Getting a list of corrupt files and subsequently identifying the corrupt blocks in each corrupt file. The current getCorruptFiles() RPC enables getting the list of corrupt files.
          OR
          2. Getting a list of corrupt files annotated by the corrupt blocks. If this patch introduced a RPC with that functionality, it would be an improvement over the getCorruptFiles() RPC.

          I have a patch for https://issues.apache.org/jira/browse/HDFS-1171 that depends on the getCorruptFiles() RPC, so removal of that RPC with no substitute would mean loss of functionality.

          Show
          Ramkumar Vadali added a comment - The RaidNode use case at a high level is to identify corrupted data that can be fixed by using parity data. This can be achieved by: 1. Getting a list of corrupt files and subsequently identifying the corrupt blocks in each corrupt file. The current getCorruptFiles() RPC enables getting the list of corrupt files. OR 2. Getting a list of corrupt files annotated by the corrupt blocks. If this patch introduced a RPC with that functionality, it would be an improvement over the getCorruptFiles() RPC. I have a patch for https://issues.apache.org/jira/browse/HDFS-1171 that depends on the getCorruptFiles() RPC, so removal of that RPC with no substitute would mean loss of functionality.
          Hide
          Konstantin Shvachko added a comment -

          I don't see the patch. And I don't know how the RPC alone can enable the functionality. I think this discussion should go on within HDFS-1171, where you can make a case as Sanjay suggested for introducing the RPC, which has never been done yet. What is wrong with adding the call in HDFS-1171?

          Show
          Konstantin Shvachko added a comment - I don't see the patch. And I don't know how the RPC alone can enable the functionality. I think this discussion should go on within HDFS-1171 , where you can make a case as Sanjay suggested for introducing the RPC, which has never been done yet. What is wrong with adding the call in HDFS-1171 ?
          Hide
          Sriram Rao added a comment -

          Attached is an updated that addresses all the issues that Konstantin had pointed out. Thanks Konstantin.

          I also refactored the tests in TestFileCorruption.java that were related to this issue (listCorruptFileBlocks) and made them a separate test (TestListCorruptFileBlocks.java) that tests this feature.

          Show
          Sriram Rao added a comment - Attached is an updated that addresses all the issues that Konstantin had pointed out. Thanks Konstantin. I also refactored the tests in TestFileCorruption.java that were related to this issue (listCorruptFileBlocks) and made them a separate test (TestListCorruptFileBlocks.java) that tests this feature.
          Hide
          Sriram Rao added a comment -

          Re-attaching the original patch. Deleted the wrong one.

          Show
          Sriram Rao added a comment - Re-attaching the original patch. Deleted the wrong one.
          Hide
          Sriram Rao added a comment -

          Re-attaching the patch that addresses the comments from Konstantin and including the testcase that was added for this issue.

          Show
          Sriram Rao added a comment - Re-attaching the patch that addresses the comments from Konstantin and including the testcase that was added for this issue.
          Hide
          Konstantin Shvachko added a comment -

          +1 The patch looks good to me.
          Hudson is down, Sriram could you please run test and test-patch targets, and post the summary here.
          I plan to commit it after that.

          Show
          Konstantin Shvachko added a comment - +1 The patch looks good to me. Hudson is down, Sriram could you please run test and test-patch targets, and post the summary here. I plan to commit it after that.
          Hide
          dhruba borthakur added a comment -

          I am -1 for committing this change.

          I think we all agree that this functionality is going to be useful for a variety of tools. One of the tools is the existing fsck process. The reason fsck uses a servlet is because it is long running, otherwise Hadop RPCs will timeout. But there is no such requirement that this new option added to fsck should also use the servlet, "fsck -listcoprruptfiles" is not a long-running process.

          In fact, i would argue that "fsck -listcorruptfiles", should itself use the DistributedFileSystem API.

          Show
          dhruba borthakur added a comment - I am -1 for committing this change. I think we all agree that this functionality is going to be useful for a variety of tools. One of the tools is the existing fsck process. The reason fsck uses a servlet is because it is long running, otherwise Hadop RPCs will timeout. But there is no such requirement that this new option added to fsck should also use the servlet, "fsck -listcoprruptfiles" is not a long-running process. In fact, i would argue that "fsck -listcorruptfiles", should itself use the DistributedFileSystem API.
          Hide
          dhruba borthakur added a comment -

          I had an extended offline discussion with Konstantin. The meat of the conversation is that RaidNode is designed to poll to find missing blocks in HDFS very frequently. This means that invokign this feature via a servlet is going to be a resource bottleneck for the RaidNode. It would be really elegant for the RaidNode to be able to invoke a method in the DistributedFileSystem to find missing blocks. Another point of discussion is that "fsck" is yet another tool that that can best use the existing APIs (via DistributedFileSystem) rather than using internal interfaces in FSNamesystem.

          I like the elegance of the new API (much better than the existing interface that is being deleted) and if we can add it to the DistributedFileSystem then that will be great!

          Show
          dhruba borthakur added a comment - I had an extended offline discussion with Konstantin. The meat of the conversation is that RaidNode is designed to poll to find missing blocks in HDFS very frequently. This means that invokign this feature via a servlet is going to be a resource bottleneck for the RaidNode. It would be really elegant for the RaidNode to be able to invoke a method in the DistributedFileSystem to find missing blocks. Another point of discussion is that "fsck" is yet another tool that that can best use the existing APIs (via DistributedFileSystem) rather than using internal interfaces in FSNamesystem. I like the elegance of the new API (much better than the existing interface that is being deleted) and if we can add it to the DistributedFileSystem then that will be great!
          Hide
          Sriram Rao added a comment -

          @Dhruba---Glad to hear that you like the new API. So, from this can I ask that you +1 this change? The DistributedFileSystem sounds like a separate JIRA.

          Thanks!

          Sriram

          Show
          Sriram Rao added a comment - @Dhruba---Glad to hear that you like the new API. So, from this can I ask that you +1 this change? The DistributedFileSystem sounds like a separate JIRA. Thanks! Sriram
          Hide
          dhruba borthakur added a comment -

          hi sriram, my vote would be to expose the new API via the DistributedFileSystem and then make fsck use than instead of going thought the servlet. Will it be posible for you to make this change?

          Show
          dhruba borthakur added a comment - hi sriram, my vote would be to expose the new API via the DistributedFileSystem and then make fsck use than instead of going thought the servlet. Will it be posible for you to make this change?
          Hide
          Suresh Srinivas added a comment -

          Dhruba - the API should be added to FileSystem and throw UnsupportedOperationException from file systems that do not support this.

          Show
          Suresh Srinivas added a comment - Dhruba - the API should be added to FileSystem and throw UnsupportedOperationException from file systems that do not support this.
          Hide
          dhruba borthakur added a comment -

          > The API should be added to FileSystem

          This idea sounds fine to me.

          Show
          dhruba borthakur added a comment - > The API should be added to FileSystem This idea sounds fine to me.
          Hide
          Sriram Rao added a comment -

          @Dhruba/Suresh,

          Given that we need to a new API to FileSystem, that sounds like new a JIRA to me. I'd be happy to do it. That said, I'd like to get a closure on this issue.

          @Dhruba---Since you seem to be in agreement with the changes for this JIRA, can you make it a +1?

          Show
          Sriram Rao added a comment - @Dhruba/Suresh, Given that we need to a new API to FileSystem, that sounds like new a JIRA to me. I'd be happy to do it. That said, I'd like to get a closure on this issue. @Dhruba---Since you seem to be in agreement with the changes for this JIRA, can you make it a +1?
          Hide
          Suresh Srinivas added a comment -

          I agree this should be done as a separate jira.

          Show
          Suresh Srinivas added a comment - I agree this should be done as a separate jira.
          Hide
          dhruba borthakur added a comment -

          I withdraw my veto.

          +0

          Show
          dhruba borthakur added a comment - I withdraw my veto. +0
          Hide
          Sriram Rao added a comment -

          Previous patch got stale (due to other checkins). Uploading a new patch.

          Show
          Sriram Rao added a comment - Previous patch got stale (due to other checkins). Uploading a new patch.
          Hide
          Konstantin Shvachko added a comment -

          Made two cosmetic changes:

          1. removed redundant import in ClientProtocol.
          2. Converted to JavaDoc the description of TestListCorruptFileBlocks.
          Show
          Konstantin Shvachko added a comment - Made two cosmetic changes: removed redundant import in ClientProtocol. Converted to JavaDoc the description of TestListCorruptFileBlocks.
          Hide
          Konstantin Shvachko added a comment -

          I just committed this.
          Thank you Sriram.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Sriram.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #385 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/385/)
          HDFS-1111. Introduce getCorruptFileBlocks() for fsck. Contributed by Sriram Rao.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #385 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/385/ ) HDFS-1111 . Introduce getCorruptFileBlocks() for fsck. Contributed by Sriram Rao.

            People

            • Assignee:
              Sriram Rao
              Reporter:
              Rodrigo Schmidt
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development