Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-991

enctypt data throw exception with a sql filter push down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.7.0, 1.6.8, 1.6.9, 1.6.10
    • 1.7.0, 1.6.11
    • Java
    • None
    • 1.ORC 1.6.8+
      2.SparkSQL 2.4.7
      3.JDK 1.8

    Description

      1.create a table 

      CREATE TABLE `itmp8888`(`id` INT, `name` STRING)
      ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
      WITH SERDEPROPERTIES (
      'serialization.format' = '1'
      )
      STORED AS
      INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
      TBLPROPERTIES (
      'transient_lastDdlTime' = '1631174384',
      'orc.encrypt' = 'AES_CTR_128:id,name',
      'orc.mask' = 'sha256:id,name',
      'orc.encrypt.ezk' = 'jNCeDBtNfT8wPaTpR34JHA=='
      )

      2. insert data

      3.  a select statement that no filters is fine

         select * from itmp8888

      4. a select statement  with the filter including the encrypted column will throw exception

        select * from itmp8888 where id = 1

       

      5.the stack trace

      Caused by: java.lang.AssertionError: Index is not populated for 1Caused by: java.lang.AssertionError: Index is not populated for 1 at org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:995) at org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1083) at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1101) at org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1151) at org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1186) at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:248) at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:864) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:142) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:211) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:175) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)

      6. I debug the code find that the RowIndex is null for all the encrypted columns

       

      Attachments

        1. files.zip
          225 kB
          hgs

        Issue Links

          Activity

            People

              Guiyankuang Yiqun Zhang
              hgs19921112 hgs
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: