Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22676

Avoid iterating all partition paths when spark.sql.hive.verifyPartitionPath=true

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.4.0
    • SQL
    • None

    Description

      In current code, it will scanning all partition paths when spark.sql.hive.verifyPartitionPath=true.
      e.g. table like below:
      CREATE TABLE `test`(
      `id` int,
      `age` int,
      `name` string)
      PARTITIONED BY (
      `A` string,
      `B` string)
      load data local inpath '/tmp/data1' into table test partition(A='00', B='00')
      load data local inpath '/tmp/data1' into table test partition(A='01', B='01')
      load data local inpath '/tmp/data1' into table test partition(A='10', B='10')
      load data local inpath '/tmp/data1' into table test partition(A='11', B='11')

      If I query with SQL – "select * from test where year=2017 and month=12 and day=03", current code will scan all partition paths including '/data/A=00/B=00', '/data/A=00/B=00', '/data/A=01/B=01', '/data/A=10/B=10', '/data/A=11/B=11'. It costs much time and memory cost.

      Attachments

        Activity

          People

            jinxing6042@126.com Jin Xing
            jinxing6042@126.com Jin Xing
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: