What problems you ran into that compelled to use https port for service in hftp file system? Earlier and for hdfs it uses rpc port. Rpc port is used, because eventually its the Rpc port that is passed in TokenSelector to select a token (from ipc.Client). Was it tested with secure hftp setup or is there a different call flow that I am missing?
The way it worked before this change is that getUri() returned the wrong port, ie. the https instead of http. Thus if you tried newFs = FileSystem.get(hftpFs.getUri(), conf), you would get a filesystem that wouldn't work because it would try to talk http on the https port.
My prior patch changed the uri to return the correct port. Unfortunately, the token renewal was relying on the broken port behavior in the uri (https instead of http) so that getCanonicalService would return the https port when it extracted the authority from the uri. My last change returns the correct port in uri, and the https port for the service.
Nothing changed with regard to rpc. A hdfs token, whether obtained over rpc or http, used to be universal and could be used by either transport. The token renewal change made it so a token acquired over rpc can be used with hftp, but a token obtained over hftp cannot be used for rpc. Hftp looks for either a hftp or rpc token, makes a copy of the token and resets the copy's type to rpc. This copy is then serialized into hftp requests. The renewer requires the unaltered hftp token to contain the https address. None of this behavior was changed.
Modified the getCanonicalService changes to be compatible with expectations of the TokenCache.
This method returns null for all the file systems that don't have a valid authority. Why is this change required?
Your change in HADOOP-7661 undid the agreed upon change in
HADOOP-7602. I simply changed it back.
To summarize: The TokenCache is the only user of getCanonicalServiceName. The cache expects the value to be the token's service. Until just recently, the default behavior for getCanonicalServiceName was to encode the authority of the fs's uri into a service. If the uri had no authority, and thus lacked tokens, it would return junk values like ":0". No external filesystems that relied on this behavior could have possibly produced a working token.
Earlier, you were very concerned about the risk of returning an empty string instead of ":0" for filesystems with no authority. On
HADOOP-7602, the agreement was to return null instead of ":0" and have the token cache skip the filesystem.
This is a public API, I am uncomfortable modifying it to return null for all file systems except hdfs, hftp.
That is an incorrect reading of the code. The default is to return a token service for any filesystem that contains an authority (like before). Null is not returned for all other filesystems – null is only returned when the filesystem has no authority, per
If you are concerned about changing a public api, I'm not sure why completely changing the semantics of getCanonicalServiceName is not a cause for concern. It's sometimes a token service (as the method name implies), or sometimes a uri. That's inconsistent and very risky and incompatible with the token cache's expectation of it being a service. Just because it "works" doesn't mean it's right to abuse the api.