Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Implemented
-
0.9.0, 1.0.0
-
None
-
None
Description
Hadoop uses CBZip2InputStream to decode bzip2 files. However, the implementation is not threadsafe and Spark may run multiple tasks in the same JVM, which leads to this error. This is not a problem for Hadoop MapReduce because Hadoop runs each task in a separate JVM.
A workaround is to set `SPARK_WORKER_CORES=1` in spark-env.sh for a standalone cluster.
Attachments
Issue Links
- requires
-
HADOOP-10614 CBZip2InputStream is not threadsafe
-
- Closed
-