Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-9
Description
When spark streaming is used to write parquet files out to an external table a folder _spark_metadata is created within the directory of the table. Hive is capable of dealing with this directory, but Impala trips on it.
So REFRESH TABLE won't work as it sees a directory with data Impala cannot cope with. A SELECT will also not work as it trips on the spark_metadata __ folder _.
Issue was found in CDP 7.1.7 SP1 but I suspect it is in all versions
Regards Matthias
Attachments
Issue Links
- links to