Thanks Cos for your feedback Now my turn to respond
1) Purpose of this patch: Its good to decouple hadoop services from from hdfs semantics, wherever possible. This will pave the way for using bigtop to deploy more than just standard HDFS based hadoop services. Thats the main purpose of this patch. Passively, it also cleans up some stuff to incrementally improve issues like (2) below:
2) Regarding the "partially initialized file systems" : That is a great point ! That is actually why we've put in "mkdir -p" instead of just "mkdir" as part of this patch . thus, the "partially initialized FS" problem is much more flexibly dealt with by the init-hcfs.sh script, than with the original init hDfs script.
3) Regarding "very slow performance of init-hdfs": You are right that that your idea to use DFS direct APIs could be good for performance . This is synergistic with init-hcfs.sh.... By making a "generic" initi-hcfs.sh script (look closely at the patch, you will see that init-hdfs.sh is now much simpler), it paves the ways for you HDFS folks to create optimized HDFS path for file creation, but it also contributes an HCFS compliant alternative which the HCFS community can use with our bigtop based deployments.
I think the 3 bullets above are a good start to an important debate that NEEDS to happen in the open.
Lets please keep this debate going. The dialogue is probably just as important as the patch.
- now in case thats not a compelling argument for this patch, heres an alternative approach *
If you still feel that having init-hdfs.sh and init-hcfs.sh as side-by-side utilities is bad, then maybe i can add init-hcfs.sh into bigtop so that, from our side, the broader FileSYstem ecosystem (which ultimately will contirbute back and improve HDFS by contributing to the robustness of its interfaces and tests), we have a foothold in bigtop upon which we can innovate to further diversify the bigtop stack so that it can support a more diverse range of hadoop deployments.