Steve Loughran Thank you for your feedback.
"hadoop.token.files" is not a core-default file, it is a system property.
The "hadoop.token.files" property can be defined in two places.
One is system core-default file and the other is system property. The code is intended since we considered the two use cases.
In general, at runtime, the user uses system property.
However, if the user gets the token periodically somehow, and stores in specific directory in their system. I think they can also use the token filename in core-default file. This code has the error handling when the file does not exist. Even the file does not exist, it won't break the job. It will continuously work without the user mentioned credential files.
Add some more logging too. Print out the files before they are loaded? Please.
I thought it is a extension feature of HADOOP_TOKEN_FILE_LOCATION.
Finally, why skip files that aren't there or aren't files? Isn't that a sign of an error?
As I explained above, it won't break the job even the token files are not available.
We don't know that the credential is expired or token file is existed.
It allows to keep work even it does not have right credential for the service.
For instance, if it needs to access WebHDFS filesystem and the credential is not available which in hadoop.token.files, it will call SPNEGO to renew the token. Therefore, the job can be work continuously without stop.
Otherwise, someone —and I fear it shall be me— will end up trying to debug why a launched YARN app hasn't picked up credentials from oozie, with the cause being a typo in the path which was logged at all
When the credentials is translated to distributed system, the Credentials class has multiple tokens. It will be stored on one file that has in HADOOP_TOKEN_FILE_LOCATION. If the initial client application read the credential token successfully, the token can be distributed to other job.
String files = System.getProperty("hadoop.token.files", System.getEnv("HADOOP_TOKEN_FILE_LOCATION"))
the env would get picked up, the sysprop override. Then have one follow on codepath with the logging I mentioned earlier.
As it is, there's now the situation that both options can be set. Is that really what is wanted?
The main intention of it is that read credentials from files as much as possible.
It allows to use multiple token filenames. It would not break previous configuration.
For instance, YARN uses the HADOOP_TOKEN_FILE_LOCATION property as a default credential filename. The credential file has multiple tokens. I think it is better to support multiple token filenames.