[HIVE-8457] MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
None

Description

Currently, on the Spark branch, each thread it is bound with a thread-local IOContext, which gets initialized when we generates an input HadoopRDD, and later used in MapOperator, FilterOperator, etc.

And, given the introduction of ~~HIVE-8118~~, we may have multiple downstream RDDs that share the same input HadoopRDD, and we would like to have the HadoopRDD to be cached, to avoid scanning the same table multiple times. A typical case would be like the following:

     inputRDD     inputRDD
        |            |
       MT_11        MT_12
        |            |
       RT_1         RT_2

Here, MT_11 and MT_12 are MapTran from a splitted MapWork,
and RT_1 and RT_2 are two ReduceTran. Note that, this example is simplified, as we may also have ShuffleTran between MapTran and ReduceTran.

When multiple Spark threads are running, MT_11 may be executed first, and it will ask for an iterator from the HadoopRDD will trigger the creation of the iterator, which in turn triggers the initialization of the IOContext associated with that particular thread.

Now, the problem is: before MT_12 starts executing, it will also ask for an iterator from the
HadoopRDD, and since the RDD is already cached, instead of creating a new iterator, it will just fetch it from the cached result. However, this will skip the initialization of the IOContext associated with this particular thread. And, when MT_12 starts executing, it will try to initialize the MapOperator, but since the IOContext is not initialized, this will fail miserably.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-8457.1-spark.patch
23/Oct/14 23:53
5 kB
Chao Sun
HIVE-8457.2-spark.patch
24/Oct/14 16:51
6 kB
Chao Sun

Issue Links

is depended upon by

HIVE-8437 Modify SparkPlan generation to set toCache flag to SparkTrans where caching is needed [Spark Branch]

Resolved

is related to

HIVE-8118 Support work that have multiple child works to work around SPARK-3622 [Spark Branch]

Resolved

links to

RB Link

Activity

People

Assignee:: Chao Sun

Reporter:: Chao Sun

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Oct/14 17:37

Updated:: 29/May/15 02:32

Resolved:: 24/Oct/14 18:24