We're writing from Pig map tasks, about 20 million records of one integer each.
For the case of 12 nodes, with 256-384 vnodes per node, we get around 4000 threads per mapper. This obviously overloads the nodes, since the number of RPC threads are capped, and the write fails.
Also, each transfer is only in the order of a few bytes of payload. Clearly batching is a good solution.