Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30649

Azure Spark read : ContentMD5 header is missing in the response

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • Spark Core
    • None
    • Important

    Description

      When we are reading from azure csv file, We are getting exception as Content MD5 header is missing in the response.

       

       

      2020-01-27 11:03:12.255 ERROR 8 — [r for task 1030] o.a.h.fs.azure.NativeAzureFileSystem : Encountered Storage Exception for read on Blob : Performance_Dataset/PR_DS_1cr.csv/part-00000-1af0b4b3-018e-4847-9441-3e5239c94e33-c000.csv Exception details: java.io.IOException Error Code : MissingContentMD5Header 2020-01-27 11:03:12.258 ERROR 8 — [r for task 1030] org.apache.spark.executor.Executor : Exception in task 0.0 in stage 626.0 (TID 1030)

      java.io.IOException: null at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:737) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:264) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.BlobInputStream.readInternal(BlobInputStream.java:448) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.BlobInputStream.read(BlobInputStream.java:420) ~[azure-storage-5.0.0.jar!/:na] at org.apache.hadoop.fs.azure.BlockBlobInputStream.read(BlockBlobInputStream.java:281) ~[hadoop-azure-2.9.0.jar!/:na] at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(NativeAzureFileSystem.java:882) ~[hadoop-azure-2.9.0.jar!/:na] at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) ~[na:1.8.0_212] at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_212] at java.io.DataInputStream.read(DataInputStream.java:149) ~[na:1.8.0_212] at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:218) ~[hadoop-common-2.9.0.jar!/:na] at org.apache.hadoop.util.LineReader.readLine(LineReader.java:176) ~[hadoop-common-2.9.0.jar!/:na] at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.hasNext(HadoopFileLinesReader.scala:50) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) ~[scala-library-2.11.12.jar!/:na] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) ~[scala-library-2.11.12.jar!/:na] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[na:na] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.scheduler.Task.run(Task.scala:109) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212] Caused by: com.microsoft.azure.storage.StorageException: ContentMD5 header is missing in the response. at com.microsoft.azure.storage.blob.CloudBlob$9.preProcessResponse(CloudBlob.java:1359) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.CloudBlob$9.preProcessResponse(CloudBlob.java:1310) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:139) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(CloudBlob.java:1492) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:255) ~[azure-storage-5.0.0.jar!/:na] ... 39 common frames omitted

      2020-01-27 11:03:12.259 ERROR 8 — [result-getter-2] o.apache.spark.scheduler.TaskSetManager : Task 0 in stage 626.0 failed 1 times; aborting job 2020-01-27 11:03:12.264 ERROR 8 — [io-8090-exec-10] o.a.c.c.C.[.[.[.[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [/connectors] threw exception [Request processing failed; nested exception is org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 626.0 failed 1 times, most recent failure: Lost task 0.0 in stage 626.0 (TID 1030, localhost, executor driver): java.io.IOException at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:737) at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:264) at com.microsoft.azure.storage.blob.BlobInputStream.readInternal(BlobInputStream.java:448) at com.microsoft.azure.storage.blob.BlobInputStream.read(BlobInputStream.java:420) at org.apache.hadoop.fs.azure.BlockBlobInputStream.read(BlockBlobInputStream.java:281) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(NativeAzureFileSystem.java:882) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:218) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:176) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.hasNext(HadoopFileLinesReader.scala:50) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.storage.StorageException: ContentMD5 header is missing in the response. at com.microsoft.azure.storage.blob.CloudBlob$9.preProcessResponse(CloudBlob.java:1359) at com.microsoft.azure.storage.blob.CloudBlob$9.preProcessResponse(CloudBlob.java:1310) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:139) at com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(CloudBlob.java:1492) at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:255) ... 39 more

      Driver stacktrace:] with root cause

      com.microsoft.azure.storage.StorageException: ContentMD5 header is missing in the response. at com.microsoft.azure.storage.blob.CloudBlob$9.preProcessResponse(CloudBlob.java:1359) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.CloudBlob$9.preProcessResponse(CloudBlob.java:1310) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:139) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(CloudBlob.java:1492) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:255) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.BlobInputStream.readInternal(BlobInputStream.java:448) ~[azure-storage-5.0.0.jar!/:na] at com.microsoft.azure.storage.blob.BlobInputStream.read(BlobInputStream.java:420) ~[azure-storage-5.0.0.jar!/:na] at org.apache.hadoop.fs.azure.BlockBlobInputStream.read(BlockBlobInputStream.java:281) ~[hadoop-azure-2.9.0.jar!/:na] at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(NativeAzureFileSystem.java:882) ~[hadoop-azure-2.9.0.jar!/:na] at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) ~[na:1.8.0_212] at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_212] at java.io.DataInputStream.read(DataInputStream.java:149) ~[na:1.8.0_212] at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:218) ~[hadoop-common-2.9.0.jar!/:na] at org.apache.hadoop.util.LineReader.readLine(LineReader.java:176) ~[hadoop-common-2.9.0.jar!/:na] at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184) ~[hadoop-mapreduce-client-core-2.7.2.jar!/:na] at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.hasNext(HadoopFileLinesReader.scala:50) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) ~[scala-library-2.11.12.jar!/:na] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) ~[scala-library-2.11.12.jar!/:na] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[na:na] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) ~[spark-sql_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.scheduler.Task.run(Task.scala:109) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) ~[spark-core_2.11-2.3.1.jar!/:2.3.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_212] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_212] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]

      Attachments

        Activity

          People

            Unassigned Unassigned
            nileshpatil1992 Nilesh Patil
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: