[SPARK-22676] Avoid iterating all partition paths when spark.sql.hive.verifyPartitionPath=true - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.4.0
Component/s: SQL
Labels:
None

Description

In current code, it will scanning all partition paths when spark.sql.hive.verifyPartitionPath=true.
e.g. table like below:
CREATE TABLE `test`(
`id` int,
`age` int,
`name` string)
PARTITIONED BY (
`A` string,
`B` string)
load data local inpath '/tmp/data1' into table test partition(A='00', B='00')
load data local inpath '/tmp/data1' into table test partition(A='01', B='01')
load data local inpath '/tmp/data1' into table test partition(A='10', B='10')
load data local inpath '/tmp/data1' into table test partition(A='11', B='11')

If I query with SQL – "select * from test where year=2017 and month=12 and day=03", current code will scan all partition paths including '/data/A=00/B=00', '/data/A=00/B=00', '/data/A=01/B=01', '/data/A=10/B=10', '/data/A=11/B=11'. It costs much time and memory cost.

Attachments

Issue Links

links to

[Github] Pull Request #19868 (jinxing64)

[Github] Pull Request #21091 (jinxing64)

GitHub Pull Request #19868

Activity

People

Assignee:: Jin Xing

Reporter:: Jin Xing

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Dec/17 06:59

Updated:: 08/Jan/20 09:48

Resolved:: 17/Apr/18 13:53