Details
Description
HADOOP-14520 introduced a regression in hflush() and hsync(). Previously, for the default case where users upload data as block blobs, these were no-ops. Unfortunately, HADOOP-14520 accidentally implemented hflush() and hsync() by default, so any data buffered in the stream is immediately uploaded to storage. This new behavior is undesirable, because block blobs have a limit of 50,000 blocks. Spark users are now seeing failures due to exceeding the block limit, since Spark frequently invokes hflush().