[SPARK-18700] getCached in HiveMetastoreCatalog not thread safe cause driver OOM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.1, 2.0.0, 2.1.1
Fix Version/s: 2.0.3, 2.1.1, 2.2.0
Component/s: SQL
Labels:
None

Description

In our spark sql platform, each query use same HiveContext and independent thread, new data will append to tables as new partitions every 30min. After a new partition added to table T, we should call refreshTable to clear T’s cache in cachedDataSourceTables to make the new partition searchable.
For the table have more partitions and files(much bigger than spark.sql.sources.parallelPartitionDiscovery.threshold), a new query of table T will start a job to fetch all FileStatus in listLeafFiles function. Because of the huge number of files, the job will run several seconds, during the time, new queries of table T will also start new jobs to fetch FileStatus because of the function of getCache is not thread safe. Final cause a driver OOM.

Attachments

Issue Links

links to

[Github] Pull Request #16135 (xuanyuanking)

[Github] Pull Request #16350 (xuanyuanking)

Activity

People

Assignee:: Yuanjian Li

Reporter:: Yuanjian Li

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 03/Dec/16 19:01

Updated:: 21/Dec/16 21:56

Resolved:: 19/Dec/16 19:40