When switching SolrCloud from local dataDir to HDFS directory factory indexing performance falls through the floor.
I've also observed very high latency on both QTime and code timer on HDFS writes compares to local dataDir writes (using check_solr_write.pl from https://github.com/harisekhon/nagios-plugins). Single test document write latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 on some runs.
A previous bulk online indexing job from Hive to SolrCloud that took 2 hours for 620M rows ended up taking a projected 20+ hours and never completing, usually breaking around the 16-17 hour timeframe when left overnight.
It's worth noting that I had to disable the HDFS write cache which was causing index corruption (
SOLR-7255) on the advice of Mark Miller, who tells me this doesn't make much performance difference anway.
This is probably also related to SolrCloud not respecting HDFS replication factor, effectively making 4 copies of data instead of 2 (
SOLR-6528), but that solely doesn't account for the massive performance drop going from vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.