Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7145

DFSInputStream does not return when reading

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.5.0
    • None
    • hdfs-client
    • None

    Description

      We found that DFSInputStream#read does not return when hbase handlers read files from hdfs, and all handlers are in the org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(). jstack is as follows:
      "RS_PARALLEL_SEEK-hadoop474:60020-9" prio=10 tid=0x00007f7350be0000 nid=0x1572 runnable [0x000000005a9de000]
      java.lang.Thread.State: RUNNABLE
      at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
      at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
      at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
      at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)

      • locked <0x000000039ad6e730> (a sun.nio.ch.Util$2)
      • locked <0x000000039ad6e320> (a java.util.Collections$UnmodifiableSet)
      • locked <0x00000002bf480738> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1986)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:395)
        at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786)
        at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665)
        at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
        at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1023)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:966)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1293)
        at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90)
        at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1223)
        at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430)
        at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312)
        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392)
        at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:532)
        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:553)
        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237)
        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
        at org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHandler.java:57)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

      I read HDFS source code and discover:
      1. NioInetPeer#in and NioInetPeer#out default timeout value is 0

        NioInetPeer(Socket socket) throws IOException {
          this.socket = socket;
          this.in = new SocketInputStream(socket.getChannel(), 0);
          this.out = new SocketOutputStream(socket.getChannel(), 0);
          this.isLocal = socket.getInetAddress().equals(socket.getLocalAddress());
        }
      
        public SocketInputStream(ReadableByteChannel channel, long timeout)
                                                              throws IOException {
          SocketIOWithTimeout.checkChannelValidity(channel);
          reader = new Reader(channel, timeout);
        }
      
          Reader(ReadableByteChannel channel, long timeout) throws IOException {
            super((SelectableChannel)channel, timeout);
            this.channel = channel;
          }
      
        SocketIOWithTimeout(SelectableChannel channel, long timeout) 
                                                       throws IOException {
          checkChannelValidity(channel);
          
          this.channel = channel;
          this.timeout = timeout;
          // Set non-blocking
          channel.configureBlocking(false);
        }
      

      and result in SocketIOWithTimeout#timeout=0
      2. BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout
      which lead to org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(timeout=0) and does not return.

      We can solve this problem by setting NioInetPeer's timeout in BlockReaderFactory#nextTcpPeer. Details is in patch file

      Attachments

        1. HDFS-7145.patch
          0.7 kB
          Jiandan Yang

        Activity

          People

            Unassigned Unassigned
            yangjiandan Jiandan Yang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: