Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10609

Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha2
    • Component/s: encryption
    • Labels:
      None
    • Environment:

      CDH5.8.0

    • Hadoop Flags:
      Reviewed
    • Release Note:
      If pipeline recovery fails due to expired encryption key, attempt to refresh the key and retry.

      Description

      In normal operations, if SASL negotiation fails due to InvalidEncryptionKeyException, it is typically a benign exception, which is caught and retried :

      SaslDataTransferServer#doSaslHandshake
        if (ioe instanceof SaslException &&
            ioe.getCause() != null &&
            ioe.getCause() instanceof InvalidEncryptionKeyException) {
          // This could just be because the client is long-lived and hasn't gotten
          // a new encryption key from the NN in a while. Upon receiving this
          // error, the client will get a new encryption key from the NN and retry
          // connecting to this DN.
          sendInvalidKeySaslErrorMessage(out, ioe.getCause().getMessage());
        } 
      
      DFSOutputStream.DataStreamer#createBlockOutputStream
      if (ie instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0) {
                  DFSClient.LOG.info("Will fetch a new encryption key and retry, " 
                      + "encryption key was invalid when connecting to "
                      + nodes[0] + " : " + ie);
      

      However, if the exception is thrown during pipeline recovery, the corresponding code does not handle it properly, and the exception is spilled out to downstream applications, such as SOLR, aborting its operation:

      2016-07-06 12:12:51,992 ERROR org.apache.solr.update.HdfsTransactionLog: Exception closing tlog.
      org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: Can't re-compute encryption key for nonce, since the required block key (keyID=557709482) doesn't exist. Current key: 1350592619
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)
      2016-07-06 12:12:51,997 ERROR org.apache.solr.update.CommitTracker: auto commit error...:org.apache.solr.common.SolrException: org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: Can't re-compute encryption key for nonce, since the required block key (keyID=557709482) doesn't exist. Current key: 1350592619
      at org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:316)
      at org.apache.solr.update.TransactionLog.decref(TransactionLog.java:505)
      at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:380)
      at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:676)
      at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:623)
      at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: Can't re-compute encryption key for nonce, since the required block key (keyID=557709482) doesn't exist. Current key: 1350592619
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
      at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)

      This exception should be contained within HDFS, caught and retried just like in createBlockOutputStream()

        Attachments

        1. HDFS-10609.001.patch
          5 kB
          Wei-Chiu Chuang
        2. HDFS-10609.002.patch
          8 kB
          Wei-Chiu Chuang
        3. HDFS-10609.003.patch
          44 kB
          Wei-Chiu Chuang
        4. HDFS-10609.004.patch
          41 kB
          Wei-Chiu Chuang
        5. HDFS-10609.005.patch
          41 kB
          Wei-Chiu Chuang
        6. HDFS-10609.branch-2.7.patch
          40 kB
          Wei-Chiu Chuang

          Issue Links

            Activity

              People

              • Assignee:
                jojochuang Wei-Chiu Chuang
                Reporter:
                jojochuang Wei-Chiu Chuang
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: