[IMPALA-11469] Ignore _spark_metadata folder in table location - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 4.2.0
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-9

Description

When spark streaming is used to write parquet files out to an external table a folder _spark_metadata is created within the directory of the table. Hive is capable of dealing with this directory, but Impala trips on it.

So REFRESH TABLE won't work as it sees a directory with data Impala cannot cope with. A SELECT will also not work as it trips on the spark_metadata __ folder _.

Issue was found in CDP 7.1.7 SP1 but I suspect it is in all versions

Regards Matthias

Attachments

Issue Links

links to

Code Review

Activity

People

Assignee:: Quanlong Huang

Reporter:: Matthias Wies

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Aug/22 17:21

Updated:: 04/Nov/22 22:59

Resolved:: 04/Aug/22 21:49