Hadoop Common
  1. Hadoop Common
  2. HADOOP-3239

exists() calls logs FileNotFoundException in namenode log

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      getFileInfo returns null for File not found instead of throwing FileNotFoundException

      Description

      exists() was modified to invoke getFileStatus() internally. But getFileStatus() throws FileNotFoundException for files which does not exists and this is logged in RPC$Server for each exists() call. One way to get rid of these messages is at the Name Node, catch FileNotFoundException and return null. In this case, RPC would not log it in namenode log. But at the client end we might have to check for null on all calls of getFileStatus(). Other option at client end is to construct FileNotFoundException() when getFileInfo() returns null.

      1. HADOOP-3239-2.patch
        8 kB
        Lohit Vijayarenu
      2. HADOOP-3239-1.patch
        8 kB
        Lohit Vijayarenu

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #461 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/461/ )
        Hide
        Konstantin Shvachko added a comment -

        I just committed this. Thank you Lohit.

        Show
        Konstantin Shvachko added a comment - I just committed this. Thank you Lohit.
        Hide
        Raghu Angadi added a comment -

        +1.

        Show
        Raghu Angadi added a comment - +1.
        Hide
        Lohit Vijayarenu added a comment - - edited

        Thanks Doug and Konstantin. Attached patch modifies javadoc as suggested by Konstantin

        Show
        Lohit Vijayarenu added a comment - - edited Thanks Doug and Konstantin. Attached patch modifies javadoc as suggested by Konstantin
        Hide
        Konstantin Shvachko added a comment -

        +1
        In JavaDoc the following

           * @return object containing information regarding the file
           *         null if file not found
        

        will look like

        Returns:
        	object containing information regarding the file null if file not found
        

        I'd say "or null if..." or use a comma.

        Show
        Konstantin Shvachko added a comment - +1 In JavaDoc the following * @ return object containing information regarding the file * null if file not found will look like Returns: object containing information regarding the file null if file not found I'd say "or null if..." or use a comma.
        Hide
        Doug Cutting added a comment -

        +1 This looks good to me.

        Show
        Doug Cutting added a comment - +1 This looks good to me.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379975/HADOOP-3239-1.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 3 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379975/HADOOP-3239-1.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2217/console This message is automatically generated.
        Hide
        Lohit Vijayarenu added a comment -

        Attached patch does these

        • getFileInfo at namenode returns null if file not found
        • getFileInfo at DFSClient passes null above as is
        • exists uses null to identify if file exists
        • getFileStatus checks for null and throws FileNotFoundException
        • added a testcase to make sure getFileInfo returns null for non existing files
        Show
        Lohit Vijayarenu added a comment - Attached patch does these getFileInfo at namenode returns null if file not found getFileInfo at DFSClient passes null above as is exists uses null to identify if file exists getFileStatus checks for null and throws FileNotFoundException added a testcase to make sure getFileInfo returns null for non existing files
        Hide
        Raghu Angadi added a comment - - edited

        You are right. What was changed recently was to throw FileNotFoundException rather than IOException. The extra logging is because of DFSClient.exists() using getFileInfo().

        I still think eventually it would be nice to be able to tell RPC not to log user exceptions or be able to tell which ones to log.

        Show
        Raghu Angadi added a comment - - edited You are right. What was changed recently was to throw FileNotFoundException rather than IOException. The extra logging is because of DFSClient.exists() using getFileInfo(). I still think eventually it would be nice to be able to tell RPC not to log user exceptions or be able to tell which ones to log.
        Hide
        Doug Cutting added a comment -

        This issue is about log messages generated by calls to exists() that didn't used to be generated by calls to exists(). It is a regression. It is not about namenode exception logging more generally. (About that, however, I think folks have found it convenient that whenever an exception is received from RPC that it is also logged on the server side.)

        > getFileInfo() used to return null.

        That's not true. It has always thrown an exception for this case, since that was always before the contract of its corresponding FileSystem methods. Now that it is overloaded to support exists() too, it should no longer throw the exception at this level.

        http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/dfs/FSDirectory.java?r1=549977&r2=549976&pathrev=549977

        Show
        Doug Cutting added a comment - This issue is about log messages generated by calls to exists() that didn't used to be generated by calls to exists(). It is a regression. It is not about namenode exception logging more generally. (About that, however, I think folks have found it convenient that whenever an exception is received from RPC that it is also logged on the server side.) > getFileInfo() used to return null. That's not true. It has always thrown an exception for this case, since that was always before the contract of its corresponding FileSystem methods. Now that it is overloaded to support exists() too, it should no longer throw the exception at this level. http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/dfs/FSDirectory.java?r1=549977&r2=549976&pathrev=549977
        Hide
        Raghu Angadi added a comment -

        Last comment . Also even if some logging is required for some operations, I think it should be logged at NameNode level and not at the RPC level, since only NameNode knows what it means.

        Show
        Raghu Angadi added a comment - Last comment . Also even if some logging is required for some operations, I think it should be logged at NameNode level and not at the RPC level, since only NameNode knows what it means.
        Hide
        Raghu Angadi added a comment -

        > E.g. the FileNotFoundException being thrown in case of open() needs to be logged while thrown in exists() should not be.
        I don't think FNF during open() should be logged either. It is not any more severe than FNF during existss(). In fact, many place we discouraged code that first invoked exists() and threw a FNF or opened the file depending on the result.

        Show
        Raghu Angadi added a comment - > E.g. the FileNotFoundException being thrown in case of open() needs to be logged while thrown in exists() should not be. I don't think FNF during open() should be logged either. It is not any more severe than FNF during existss(). In fact, many place we discouraged code that first invoked exists() and threw a FNF or opened the file depending on the result.
        Hide
        Raghu Angadi added a comment -

        Yes, I would prefer returning null from getFileInfo() as well. But I don't remember the reason why it was changed couple of months back.

        The main issue in this jira I think is about logging in RPC server at the NameNode. What I meant earlier was that returning null or throwing FileNotFound should be independent of RPC logging.

        Show
        Raghu Angadi added a comment - Yes, I would prefer returning null from getFileInfo() as well. But I don't remember the reason why it was changed couple of months back. The main issue in this jira I think is about logging in RPC server at the NameNode. What I meant earlier was that returning null or throwing FileNotFound should be independent of RPC logging.
        Hide
        Konstantin Shvachko added a comment -

        > should go in 0.17?

        Formally, it is not a regression in terms of performance or functionality.
        Informally, it is an annoying message, and we will have to answer questions until people will get used to it.
        I'd schedule it for 0.17.

        Show
        Konstantin Shvachko added a comment - > should go in 0.17? Formally, it is not a regression in terms of performance or functionality. Informally, it is an annoying message, and we will have to answer questions until people will get used to it. I'd schedule it for 0.17.
        Hide
        Konstantin Shvachko added a comment -

        > Is it enough if RPC does not log these?
        That would be enough if RPC could distinguish between exceptions that need to be logged and those that don't.
        But this is impossible in general. E.g. the FileNotFoundException being thrown in case of open() needs to be logged while thrown in exists() should not be.

        Show
        Konstantin Shvachko added a comment - > Is it enough if RPC does not log these? That would be enough if RPC could distinguish between exceptions that need to be logged and those that don't. But this is impossible in general. E.g. the FileNotFoundException being thrown in case of open() needs to be logged while thrown in exists() should not be.
        Hide
        Doug Cutting added a comment -

        > Is it enough if RPC does not log these?

        That sounds like a hack. If some consumers of getFileInfo() don't consider FileNotFound exceptional, then returning null is the right contract for that protocol method. As I've said before, using exception handling for normal control flow is a bad design. The exists() method should be able to return false via normal control flow.

        Show
        Doug Cutting added a comment - > Is it enough if RPC does not log these? That sounds like a hack. If some consumers of getFileInfo() don't consider FileNotFound exceptional, then returning null is the right contract for that protocol method. As I've said before, using exception handling for normal control flow is a bad design. The exists() method should be able to return false via normal control flow.
        Hide
        Raghu Angadi added a comment -

        Is it enough if RPC does not log these?

        Show
        Raghu Angadi added a comment - Is it enough if RPC does not log these?
        Hide
        Raghu Angadi added a comment -

        getFileInfo() used to return null. This was changed couple of months back during fixes to various shell commands and APIs. I am not sure the actual reason behind it, may be consistency. Returning null for non-existent files is good.

        Show
        Raghu Angadi added a comment - getFileInfo() used to return null. This was changed couple of months back during fixes to various shell commands and APIs. I am not sure the actual reason behind it, may be consistency. Returning null for non-existent files is good.
        Hide
        Konstantin Shvachko added a comment -

        I agree with Doug:

        • NameNode.getFileInfo() should return null not throwing the FileNotFoundException.
        • DFSClient.getFileInfo() should throw FileNotFoundException if the name-node returned null.
          We should also decide whether that should go in 0.17?
        Show
        Konstantin Shvachko added a comment - I agree with Doug: NameNode.getFileInfo() should return null not throwing the FileNotFoundException. DFSClient.getFileInfo() should throw FileNotFoundException if the name-node returned null. We should also decide whether that should go in 0.17?
        Hide
        Raghu Angadi added a comment -

        I don't think RPC should dictate what exception NameNode should throw. I think it should still throw FileNotFound. There is probably a way not to log selected exceptions at the RPC server. In fact the server should not log the exceptions that it is sending the client actually! may be except RuntimeExceptions.

        Show
        Raghu Angadi added a comment - I don't think RPC should dictate what exception NameNode should throw. I think it should still throw FileNotFound. There is probably a way not to log selected exceptions at the RPC server. In fact the server should not log the exceptions that it is sending the client actually! may be except RuntimeExceptions.
        Hide
        Doug Cutting added a comment -

        I think the namenode should return null for getFileInfo(), without having to catch an exception. And the client should throw an exception for null in getFileStatus() but not in exists().

        Show
        Doug Cutting added a comment - I think the namenode should return null for getFileInfo(), without having to catch an exception. And the client should throw an exception for null in getFileStatus() but not in exists().

          People

          • Assignee:
            Lohit Vijayarenu
            Reporter:
            Lohit Vijayarenu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development