[ORC-991] enctypt data throw exception with a sql filter push down - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.7.0, 1.6.8, 1.6.9, 1.6.10
Fix Version/s: 1.7.0, 1.6.11
Component/s: Java
Labels:
None
Environment:
1.ORC 1.6.8+
2.SparkSQL 2.4.7
3.JDK 1.8

Language:
- JAVA

Description

1.create a table

CREATE TABLE `itmp8888`(`id` INT, `name` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
'transient_lastDdlTime' = '1631174384',
'orc.encrypt' = 'AES_CTR_128:id,name',
'orc.mask' = 'sha256:id,name',
'orc.encrypt.ezk' = 'jNCeDBtNfT8wPaTpR34JHA=='
)

2. insert data

3. a select statement that no filters is fine

select * from itmp8888

4. a select statement with the filter including the encrypted column will throw exception

select * from itmp8888 where id = 1

5.the stack trace

Caused by: java.lang.AssertionError: Index is not populated for 1Caused by: java.lang.AssertionError: Index is not populated for 1 at org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:995) at org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1083) at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1101) at org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1151) at org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1186) at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:248) at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:864) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:142) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:211) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:175) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)

6. I debug the code find that the RowIndex is null for all the encrypted columns

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

files.zip
10/Sep/21 09:02
225 kB
hgs

Issue Links

links to

GitHub Pull Request #905

Activity

People

Assignee:: Yiqun Zhang

Reporter:: hgs

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Sep/21 01:42

Updated:: 21/Sep/21 20:42

Resolved:: 13/Sep/21 02:57