Thanks for picking this up, zhihai xu.
Do we even need to bother with HDFS? MiniDFSCluster makes every test it's a part of noticeably slower, and I'm not seeing why we need it here. I got the impression "hdfs://testcluster" was just some bogus input anyway, so can we just use something "file:///" instead? This is the default, and we're already relying on other defaults like mapreduce.framework.name=local.
As an aside, I'd personally prefer we explicitly set the values we expect to be using in unit tests, even if they're the default, for two reasons: 1) document the setup/reqs of the test and 2) protect the test from custom confs appearing in HADOOP_CONF_DIR when the test is run.