Awesome! Doc patches are great!
Now for the review. This isn't comprehensive, but here's a first pass at least.
A common problem is a missing 'the' in front of 'following'. I pointed it out in a few places, but there are more. Articles in English are tricky. 'Following' is particularly tricky though because 'following' without 'the' or 'a' in front of it is a verb (e.g., "The dog was following the boy" vs. "Hadoop has a following" or "See the following list of cool places to eat".). But hey, at least we don't have genders like the European languages though!
+other cases, like: Hadoop nodes running on virtualized platform, we have
+additional "hypervisor" layer, and its characteristics include:
I don't know how to parse this phrasing. It feels awkward. I'd probably rewrite as:
However for some cases, this is insufficient. Take for example Hadoop nodes running on a virtualized platform where there is an additional hypervisor layer. It has the following characteristics:
+- The communication price between VMs within the same hypervisor is lower
+than across hypervisor (physical host) which will have higher throughput,
+lower latency, and not generating physical network traffic.
Same sort of problem. I'd probably rephrase a bit:
"The communication price between multiple VMs running on one physical host is lower than the communication price between processes on multiple physical hosts. In addition to the multiple VMs having higher throughput and lower latency between themselves, they do not generate any network traffic on the wire."
transparent for Hadoop, so
'for' should be 'to'. Hadoop (period). (new sentence) So
like the following:
layer, following polices
+in hdfs are refined:
the following. HDFS.
+- Replica placement policy
I have a feeling bullet points in front of all the items listed under this section may render better. I need to play with it though.
of the writer
on another node
if the node of the writer
The remaining replicas are placed randomly across rack and node group to
+ meet minimum restriction.
I'm confused by this since there are missing articles and/or plurals here. Does this mean randomly across the remaining racks or randomly across all racks including the writer's rack?
At the node level
At the block level
Reliability: By never placing more than one replicas on the same node
+group(physical host), in case of node group failure, only one replica is
+lost at maximum.
Awkward phrasing. I'd probably rewrite as:
"Reliability: By never placing more than one replica in the same node
group (aka physical host), only one replica is lost at maximum in case of node group failure."
than a remote
+3-layer topology tends to support different failure and locality topologies
+which is primarily driven from the perspective of virtualization, however,
+it is also possible to use the feature support other scenarios, such as
+those relating to failures of power supplies, arbitrary sets of physical
+servers, or collections of servers from same hardware purchase cycle.
This paragraph feels like it should be up closer to the top of these changes.