[SPARK-36269] Fix only set data columns to Hive column names config - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.2.0, 3.1.3, 3.0.4
Component/s: SQL
Labels:
None

Description

When reading Hive table, we set the Hive column id and column name configs (`hive.io.file.readcolumn.ids` and `hive.io.file.readcolumn.names`). We should set non-partition columns (data columns) for both configs, as Spark always appends partition columns in its own reader - https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L240 . The column id config has only non-partition columns, but column name config has both partition and non-partition columns. We should keep them to be consistent with only non-partition columns. This does not cause issue for public OSS Hive file format, but for customized internal Hive file format, it causes the issue as we are expecting these two configs to be same.

Attachments

Issue Links

links to

[Github] Pull Request #33489 (c21)

Activity

People

Assignee:: Cheng Su

Reporter:: Cheng Su

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Jul/21 04:12

Updated:: 26/Jul/21 10:50

Resolved:: 26/Jul/21 10:50