Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8797

WebHdfsFileSystem creates too many connections for pread

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      While running a test we found that WebHdfsFileSystem can create several thousand connections when doing a position read of a 200MB file. For each connection the client will connect to the DataNode again and the DataNode will create a new DFSClient instance to handle the read request. This also leads to several thousand getBlockLocations call to the NameNode.

      The cause of the issue is that in FSInputStream#read(long, byte[], int, int), each time the inputstream reads some time, it seeks back to the old position and resets its state to SEEK. Thus the next read will regenerate the connection.

        public int read(long position, byte[] buffer, int offset, int length)
          throws IOException {
          synchronized (this) {
            long oldPos = getPos();
            int nread = -1;
            try {
              seek(position);
              nread = read(buffer, offset, length);
            } finally {
              seek(oldPos);
            }
            return nread;
          }
        }
      

        Attachments

        1. HDFS-8797.003.patch
          11 kB
          Jing Zhao
        2. HDFS-8797.002.patch
          11 kB
          Jing Zhao
        3. HDFS-8797.001.patch
          8 kB
          Jing Zhao
        4. HDFS-8797.000.patch
          5 kB
          Jing Zhao

          Activity

            People

            • Assignee:
              jingzhao Jing Zhao
              Reporter:
              jingzhao Jing Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: