Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27105

Querying parquet table with zstd encryption is failing

    XMLWordPrintableJSON

Details

    Description

      Steps to reproduce on the local with upstream or downstream code:

      • Start HiveServer2
      • Start beeline
      • Run
        set hive.execution.engine=tez;
      • Run
      CREATE TABLE emp(id int, name string, department string, salary float) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET TBLPROPERTIES ("parquet.compression"="zstd");  
      • Run
      Insert into emp VALUES (1, 'some name', 'some dept', 1.15);  
      • Getting
        java.lang.NoClassDefFoundError: com/github/luben/zstd/RecyclingBufferPool

      Java call stack for the error:

      java.lang.NoClassDefFoundError: com/github/luben/zstd/RecyclingBufferPool
      E   	at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
      E   	at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
      E   	at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:111)
      E   	at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.<init>(BaseVectorizedColumnReader.java:137)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.<init>(VectorizedPrimitiveColumnReader.java:58)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:515)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:446)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:406)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:347)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:95)
      E   	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:376)
      E   	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:82)
      E   	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:119)
      E   	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:59)
      E   	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
      E   	at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
      E   	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
      E   	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
      E   	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
      E   	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
      E   	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
      E   	at java.security.AccessController.doPrivileged(Native Method)
      E   	at javax.security.auth.Subject.doAs(Subject.java:422)
      E   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
      E   	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      E   	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
      E   	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
      E   	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
      E   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      E   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      E   	at java.lang.Thread.run(Thread.java:748)
      E   Caused by: java.lang.ClassNotFoundException: com.github.luben.zstd.RecyclingBufferPool
      E   	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
      E   	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
      E   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
      E   	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
      E   	... 36 more
      E   , errorMessage=Cannot recover from this error:java.lang.NoClassDefFoundError: com/github/luben/zstd/RecyclingBufferPool
      E   	at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
      E   	at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
      E   	at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:111)
      E   	at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.<init>(BaseVectorizedColumnReader.java:137)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.<init>(VectorizedPrimitiveColumnReader.java:58)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:515)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:446)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:406)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:347)
      E   	at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:95)
      E   	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:376)
      E   	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:82)
      E   	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:119)
      E   	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:59)
      E   	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
      E   	at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
      E   	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
      E   	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
      E   	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
      E   	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
      E   	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
      E   	at java.security.AccessController.doPrivileged(Native Method)
      E   	at javax.security.auth.Subject.doAs(Subject.java:422)
      E   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
      E   	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
      E   	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      E   	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
      E   	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
      E   	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
      E   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      E   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      E   	at java.lang.Thread.run(Thread.java:748)
      E   Caused by: java.lang.ClassNotFoundException: com.github.luben.zstd.RecyclingBufferPool
      E   	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
      E   	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
      E   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
      E   	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
      E   	... 36 more

      This is what happens during the execution of this insert statement:

      1. At some point Hive calls Tez execution on YARN cluster
      2. Tez calls the code in Hive-Exec jar. Tez classpath does not have zstd-jni.jar. Tez classpath include Tez jars, Tez lib jars and hive-exec.
      3. Hive code calls Parquet shaded in hive-exec.
      4. Parquet is failing because it cannot find zstd-jni classes at runtime.

      Attachments

        Issue Links

          Activity

            People

              difin Dmitriy Fingerman
              difin Dmitriy Fingerman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m