[MAPREDUCE-1270] Hadoop C++ Extention - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.20.1
Fix Version/s: None
Component/s: task
Labels:
None
Environment:

hadoop linux

Hadoop Flags:

Incompatible change

Description

Hadoop C++ extension is an internal project in baidu, We start it for these reasons:
1 To provide C++ API. We mostly use Streaming before, and we also try to use PIPES, but we do not find PIPES is more efficient than Streaming. So we

think a new C++ extention is needed for us.
2 Even using PIPES or Streaming, it is hard to control memory of hadoop map/reduce Child JVM.
3 It costs so much to read/write/sort TB/PB data by Java. When using PIPES or Streaming, pipe or socket is not efficient to carry so huge data.

What we want to do:
1 We do not use map/reduce Child JVM to do any data processing, which just prepares environment, starts C++ mapper, tells mapper which split it should deal with, and reads report from mapper until that finished. The mapper will read record, ivoke user defined map, to do partition, write spill, combine and merge into file.out. We think these operations can be done by C++ code.
2 Reducer is similar to mapper, it was started after sort finished, it read from sorted files, ivoke user difined reduce, and write to user defined record writer.
3 We also intend to rewrite shuffle and sort with C++, for efficience and memory control.
at first, 1 and 2, then 3.

What's the difference with PIPES:
1 Yes, We will reuse most PIPES code.
2 And, We should do it more completely, nothing changed in scheduling and management, but everything in execution.

UPDATE:

Now you can get a test version of HCE from this link http://docs.google.com/leaf?id=0B5xhnqH1558YZjcxZmI0NzEtODczMy00NmZiLWFkNjAtZGM1MjZkMmNkNWFk&hl=zh_CN&pli=1
This is a full package with all hadoop source code.
Following document "HCE InstallMenu.pdf" in attachment, you will build and deploy it in your cluster.

Attachment "HCE Tutorial.pdf" will lead you to write the first HCE program and give other specifications of the interface.

Attachment "HCE Performance Report.pdf" gives a performance report of HCE compared to Java MapRed and Pipes.

Any comments are welcomed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-HCE-1.0.0.patch
23/Jul/10 10:17
7.98 MB
Dong Yang
HCE InstallMenu.pdf
12/Jun/10 09:16
207 kB
Fusheng Han
HCE Performance Report.pdf
12/Jun/10 09:16
349 kB
Fusheng Han
HCE Tutorial.pdf
12/Jun/10 09:16
173 kB
Fusheng Han
Overall Design of Hadoop C++ Extension.doc
15/Mar/10 03:58
613 kB
Dong Yang

Issue Links

is duplicated by

MAPREDUCE-2841 Task level native optimization

Resolved

is related to

MAPREDUCE-2446 HCE 2.0

Resolved

MAPREDUCE-2841 Task level native optimization

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Wang Shouyan

Votes:: 7 Vote for this issue

Watchers:: 53 Start watching this issue

Dates

Created:: 07/Dec/09 08:09

Updated:: 13/Apr/16 16:00

Resolved:: 06/Nov/15 01:46