Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28981

Missing library for reading/writing Snappy-compressed files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • 2.4.4
    • None
    • Kubernetes, Spark Core
    • None

    Description

      The current Dockerfile for Spark on Kubernetes is missing the "ld-linux-x86-64.so.2" library needed to read / write Snappy-compressed files. 

       

      Sample error message when trying to read a parquet file compressed with snappy:

       

      19/09/02 05:33:19 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, 172.30.189.77, executor 2): org.apache.spark.SparkException: Task failed while writing rows.    
          at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257)    
          at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)    
          at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)    
          at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)    
          at org.apache.spark.scheduler.Task.run(Task.scala:121)    
          at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)    
          at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)    
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)    
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)    
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)    
          at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-04145e2f-cc82-4217-99b8-641cdd755a87-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /tmp/snappy-1.1.7-04145e2f-cc82-4217-99b8-641cdd755a87-libsnappyjava.so)    
          at java.lang.ClassLoader$NativeLibrary.load(Native Method)    
          at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)    
          at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)    
          at java.lang.Runtime.load0(Runtime.java:809)    
          at java.lang.System.load(System.java:1086)    
          at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:179)    
          at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:154)    
          at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)    
          at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)    
          at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)    
          at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)    
          at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:165)    
          at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95)    
          at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)    
          at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)    
          at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)    
          at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)    
          at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)    
          at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165)    
          at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)    
          at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:57)    
          at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:74)    
          at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)    
          at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)    
          at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)    
          at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)    
          ... 10 more
      

      The relevant library is in the Alpine Linux "gcompat" package (https://pkgs.alpinelinux.org/package/edge/community/x86/gcompat). Adding this library to the Dockerfile enables the reading/writing of Snappy-compressed files.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              psschwei Paul Schweigert
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: