Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2883

Extensive write failures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.16.0
    • 0.16.1
    • None
    • None

    Description

      With the new release 0.16.0 we experience extensive write failures under heavy load.

      The job shuffles 300TB on 1400 nodes and runs 3 waves of 2500 reducers. Each reducer uses libhdfs to write to around 70 dfs files simultaneously. We did not experience particular write problems up to nightly build #835 on hadoopqa (Jan 28),
      but now with released 0.16.0 (candidate 2) we see a lot of exceptions related to 'all datanodes are bad':

      typical exception(s):

      08/02/22 10:34:47 WARN fs.DFSClient: Error Recovery for block blk_434406883423887779 in pipeline xxx.xxx.xxx.146:50010, xxx.xxx.xxx.224:50010: bad datanode xxx.xxx.xxx.146:50010
      08/02/22 10:34:51 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:34:51 WARN fs.DFSClient: Error Recovery for block blk_-1957866292089920792 in pipeline xxx.xxx.xxx.147:50010, xxx.xxx.xxx.10:50010: bad datanode xxx.xxx.xxx.147:50010
      08/02/22 10:34:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:34:54 WARN fs.DFSClient: Error Recovery for block blk_-5265240773298481019 in pipeline xxx.xxx.xxx.152:50010, xxx.xxx.xxx.71:50010: bad datanode xxx.xxx.xxx.152:50010
      08/02/22 10:34:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:34:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed outxxx.xxx.xxx.166:50010
      08/02/22 10:34:55 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_8456718220685890569 in pipeline xxx.xxx.xxx.158:50010, xxx.xxx.xxx.225:50010: bad datanode xxx.xxx.xxx.158:50010
      08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_1420965154382429572 in pipeline xxx.xxx.xxx.169:50010, xxx.xxx.xxx.221:50010: bad datanode xxx.xxx.xxx.169:50010
      08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-519424763987472708 in pipeline xxx.xxx.xxx.154:50010, xxx.xxx.xxx.37:50010: bad datanode xxx.xxx.xxx.154:50010
      08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-8376556524788296783 in pipeline xxx.xxx.xxx.154:50010, xxx.xxx.xxx.212:50010: bad datanode xxx.xxx.xxx.154:50010
      08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-2429564741658530079 in pipeline xxx.xxx.xxx.160:50010, xxx.xxx.xxx.105:50010: bad datanode xxx.xxx.xxx.160:50010
      08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-6653210787685458124 in pipeline xxx.xxx.xxx.143:50010, xxx.xxx.xxx.37:50010: bad datanode xxx.xxx.xxx.143:50010
      08/02/22 10:35:01 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:01 WARN fs.DFSClient: Error Recovery for block blk_7515160028005424426 in pipeline xxx.xxx.xxx.167:50010, xxx.xxx.xxx.152:50010: bad datanode xxx.xxx.xxx.167:50010
      08/02/22 10:35:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:03 WARN fs.DFSClient: Error Recovery for block blk_-7191475898558388503 in pipeline xxx.xxx.xxx.139:50010, xxx.xxx.xxx.6:50010: bad datanode xxx.xxx.xxx.139:50010
      08/02/22 10:35:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:03 WARN fs.DFSClient: Error Recovery for block blk_-340745015348833165 in pipeline xxx.xxx.xxx.141:50010, xxx.xxx.xxx.153:50010: bad datanode xxx.xxx.xxx.141:50010
      08/02/22 10:35:04 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:04 WARN fs.DFSClient: Error Recovery for block blk_-6861254790596076341 in pipeline xxx.xxx.xxx.157:50010, xxx.xxx.xxx.224:50010: bad datanode xxx.xxx.xxx.157:50010
      08/02/22 10:35:14 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:14 INFO fs.DFSClient: Abandoning block blk_6188945400680100475
      08/02/22 10:35:14 INFO fs.DFSClient: Waiting to find target node: xxx.xxx.xxx.161:50010
      08/02/22 10:35:43 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:47 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:48 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:49 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:49 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:50 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:50 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:50 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:53 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:57 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:35:57 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:04 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:06 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:06 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      08/02/22 10:36:07 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out
      Exception in thread "main" java.io.IOException: All datanodes xxx.xxx.xxx.83:50010 are bad. Aborting...
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1839)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)
      Call to org.apache.hadoop.fs.FSDataOutputStream::write failed!

      Attachments

        1. packetResponse_0.16.patch
          4 kB
          Dhruba Borthakur
        2. packetResponse.patch
          4 kB
          Dhruba Borthakur

        Activity

          People

            dhruba Dhruba Borthakur
            ckunz Christian Kunz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: