Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9290

DFSClient#callAppend() is not backward compatible for slightly older NameNodes

VotersStop watchingWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.7.1
    • 2.8.0, 2.7.2, 3.0.0-alpha1
    • None
    • None
    • Reviewed

    Description

      HDFS-7210 combined 2 RPC calls used at file append into a single one. Specifically getFileInfo() is combined with append(). While backward compatibility for older client is handled by the new NameNode (protobuf). Newer client's append() call does not work with older NameNodes. One will run into an exception like the following:

      java.lang.NullPointerException
              at org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741)
              at org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550)
              at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1560)
              at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1670)
              at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717)
              at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861)
              at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922)
              at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892)
              at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340)
              at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336)
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
              at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336)
              at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318)
              at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164)
      

      The cause is that the new client code is expecting both the last block and file info in the same RPC but the old NameNode only replied with the first. The exception itself does not reflect this and one will have to look at the HDFS source code to really understand what happened.

      We can have the client detect it's talking to a old NameNode and send an extra getFileInfo() RPC. Or we should improve the exception being thrown to accurately reflect the cause of failure.

      Attachments

        1. HDFS-9290.001.patch
          1 kB
          Tony Wu
        2. HDFS-9290.002.patch
          1 kB
          Tony Wu

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            twu Tony Wu
            twu Tony Wu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment