Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to determine if source and destination are on the same filesystem. This method "getCanonicalServiceName()" returns IP address for the file system, which can be same for two different file systems but actually there are two separate storage accounts, which incorrectly causes isSameHdfs to return true even when they are different.
It seems this API should not be used to check if the src and target are in the same filesystem, according to the Hadoop API declaration . The token cache is the only user of the canonical service name, and uses it to lookup this FileSystem's service tokens.
This error was found while doing a bulk load on hbase from one file system to another file system. Since getCanonicalServiceName() was returning same address for both the storage accounts, the two file systems were getting identified as same filesystem. When the HBase bulk load commands runs, it tries to find the file on the default file system and hence it fails for FileNotFoundException.