[BIGTOP-1414] Add Apache Spark implementation to BigPetStore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: backlog
Fix Version/s: 1.0.0
Component/s: blueprints
Labels:
None

Description

Currently we only process data with hadoop. Now its time to add spark to the bigpetstore application. This will basically demonstrate the difference between a mapreduce based hadoop implementation of a big data app, versus a Spark one.

We will need to

update graphviz arch.dot to diagram spark as a new path.
Adding a spark job to the existing code, in a new package., which uses existing scala based generator, however, we will use it inside a spark job, rather than in a hadoop inputsplit.
The job should output to an RDD, which can then be serialized to disk, or else, fed into the next spark job...

So, the next spark job should

group the data and write product summaries to a local file
run a product recommender against the input data set.

We want the jobs to be runnable as modular, or as a single job, to leverage the RDD paradigm.

So it will be interesting to see how the code is architected. Lets start the planning in this JIRA. I have some stuff ive informally hacked together, maybe i can attach an initial patch just to start a dialog.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

chart.png
27/Aug/14 23:02
54 kB
Jörn Franke

Issue Links

incorporates

BIGTOP-1535 Add Spark ETL script to BigPetStore

Resolved

is blocked by

BIGTOP-1366 Updated, Richer Model for Generating Data for BigPetStore

Resolved

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: jay vyas

Reporter:: jay vyas

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Aug/14 01:51

Updated:: 18/Mar/15 22:47

Resolved:: 04/Feb/15 00:17