[CRUNCH-73] Scrunch applications using PipelineApp do not properly serialize closures to MapReduce tasks. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.4.0
Fix Version/s: 0.4.0
Component/s: Scrunch
Labels:
None

Description

One of the great potential advantages of using Scala for writing MapReduce pipelines is the ability to send side data as part of function closures, rather than through Hadoop Configurations or the Distributed Cache. As an absurdly simple example, consider the following Scala PipelineApp that divides all elements of a numeric PCollection by an arbitrary argument:

object DivideApp extends PipelineApp {
val divisor = Integer.valueOf(args(0))
val nums = read(From.textFile("numbers.txt"))
val dividedNums = nums.map

{ n => n / divisor }

dividedNums.write(To.textFile("dividedNums"))
run()
}

Executing this PipelineApp fails. MapReduce tasks get a value of "null" for divisor (or 0 if divisor is forced to be a primitive numeric type). This indicates that an error is occurring in the serialization of Scala function closures that causes unbound variables in the closure to take on their default JVM values.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CRUNCH-73-v1.patch
21/Sep/12 21:01
7 kB
Kiyan Ahmadizadeh
CRUNCH-73-v2.patch
21/Sep/12 21:47
10 kB
Kiyan Ahmadizadeh

Activity

People

Assignee:: Kiyan Ahmadizadeh

Reporter:: Kiyan Ahmadizadeh

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Sep/12 20:49

Updated:: 16/Nov/12 20:06

Resolved:: 22/Sep/12 00:32