[SPARK-6235] Address various 2G limits - ASF JIRA

XML

Word

Printable

JSON

An umbrella ticket to track the various 2G limit we have in Spark, due to the use of byte arrays and ByteBuffers.

is duplicated by

SPARK-22622 OutOfMemory thrown by Closure Serializer without proper failure propagation

SPARK-2755 TorrentBroadcast cannot broadcast very large objects

SPARK-1391 BlockManager cannot transfer blocks larger than 2G in size

is related to

SPARK-22352 task failures with java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE error

SPARK-22062 BlockManager does not account for memory consumed by remote fetches

links to

[Github] Pull Request #14647 (witgo)

(4 links to)

1.	Change default value for spark.maxRemoteBlockSizeFetchToMem to be < 2GB	Resolved	Imran Rashid
2.	create LargeByteBuffer abstraction for eliminating 2GB limit on blocks	Resolved	Josh Rosen
3.	Support caching blocks larger than 2G	Resolved	Unassigned
4.	Support uploading blocks > 2GB as a stream	Resolved	Imran Rashid
5.	Support shuffle where individual blocks might be > 2G	Resolved	Jin Xing
6.	Replace ByteBuffer with ChunkedByteBuffer	Resolved	Unassigned
7.	Replace ByteBuf with InputStream	Resolved	Unassigned
8.	Support for parallelizing R data.frame larger than 2GB	Resolved	Hossein Falaki
9.	Support replicating blocks larger than 2 GB	Resolved	Imran Rashid
10.	Support sending messages over 2GB from memory	Resolved	Imran Rashid
11.	Better error message when trying a shuffle fetch over 2 GB	Resolved	Unassigned