Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.15.0
-
None
-
None
Description
hdfsListDirectory makes one rpc call using deprecated fs.FileSystem.listPaths, and then two rpc calls for every entry in the returned array. When running a job with more than 3000 mappers each running a pipes application using libhdfs to scan a dfs directory with about 100-200 entries, this results in about 1M rpc calls to the namenode server overwhelming it.
hdfsListDirectory should call fs.FileSystem.listStatus instead.
I will submit a patch.