[HIVE-8262] Create CacheTran that transforms the input RDD by caching it [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Spark
Labels:
None

Description

In a few cases we need to cache a RDD to avoid recompute it for better performance. However, caching a map input RDD is different from caching a regular RDD due to ~~SPARK-3693~~. The way to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is transformed from the original Hadoop RDD by applying a map function, in which <key, value> pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling .cache().

This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan when caching is desirable.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-8262.1-spark.patch
30/Sep/14 18:04
3 kB
Chao Sun

Issue Links

links to

RB Link

Activity

People

Assignee:: Chao Sun

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Sep/14 22:43

Updated:: 12/Oct/14 01:05

Resolved:: 12/Oct/14 01:05