Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1861

ArrayIndexOutOfBoundsException when reading bzip2 files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • 0.9.0, 1.0.0
    • None
    • Spark Core
    • None

    Description

      Hadoop uses CBZip2InputStream to decode bzip2 files. However, the implementation is not threadsafe and Spark may run multiple tasks in the same JVM, which leads to this error. This is not a problem for Hadoop MapReduce because Hadoop runs each task in a separate JVM.

      A workaround is to set `SPARK_WORKER_CORES=1` in spark-env.sh for a standalone cluster.

      Attachments

        Issue Links

          Activity

            People

              mengxr Xiangrui Meng
              mengxr Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: