Description
Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-8851 Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]
- Open