Details
Description
The kms client appears to have no retry logic – at all. It's completely decoupled from the ipc retry logic. This has major impacts if the KMS is unreachable for any reason, including but not limited to network connection issues, timeouts, the restart during an upgrade.
This has some major ramifications:
- Jobs may fail to submit, although oozie resubmit logic should mask it
- Non-oozie launchers may experience higher rates if they do not already have retry logic.
- Tasks reading EZ files will fail, probably be masked by framework reattempts
- EZ file creation fails after creating a 0-length file – client receives EDEK in the create response, then fails when decrypting the EDEK
- Bulk hadoop fs copies, and maybe distcp, will prematurely fail