Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8797

WebHdfsFileSystem creates too many connections for pread

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      While running a test we found that WebHdfsFileSystem can create several thousand connections when doing a position read of a 200MB file. For each connection the client will connect to the DataNode again and the DataNode will create a new DFSClient instance to handle the read request. This also leads to several thousand getBlockLocations call to the NameNode.

      The cause of the issue is that in FSInputStream#read(long, byte[], int, int), each time the inputstream reads some time, it seeks back to the old position and resets its state to SEEK. Thus the next read will regenerate the connection.

        public int read(long position, byte[] buffer, int offset, int length)
          throws IOException {
          synchronized (this) {
            long oldPos = getPos();
            int nread = -1;
            try {
              seek(position);
              nread = read(buffer, offset, length);
            } finally {
              seek(oldPos);
            }
            return nread;
          }
        }
      

      Attachments

        1. HDFS-8797.000.patch
          5 kB
          Jing Zhao
        2. HDFS-8797.001.patch
          8 kB
          Jing Zhao
        3. HDFS-8797.002.patch
          11 kB
          Jing Zhao
        4. HDFS-8797.003.patch
          11 kB
          Jing Zhao

        Activity

          People

            jingzhao Jing Zhao
            jingzhao Jing Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: