Details
-
Documentation
-
Status: Resolved
-
Trivial
-
Resolution: Fixed
-
2.0.1
-
None
Description
http://spark.apache.org/docs/latest/programming-guide.html
"By default, Spark creates one partition for each block of the file (blocks being 64MB by default in HDFS)"
Currently default block size in HDFS is 128MB.
The default value has been already increased in Hadoop 2.2.0 (the oldest supported version of Spark). https://issues.apache.org/jira/browse/HDFS-4053
Since it looks confusing explanation, I'd like to fix the value from 64MB to 128MB.