[PIG-1104] [zebra] Provide streaming support in Zebra. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.4.0
Fix Version/s: 0.6.0, 0.7.0
Component/s: None
Labels:
None

Description

Hadoop streaming is very popular among Hadoop users. The main attraction is the simplicity of use. A user can write the application logic in any language and process large amounts of data using Hadoop framework. As more people start to use Zebra to store their data, we expect users would like to run Hadoop streaming scripts to easily process Zebra tables.

The following lists a simple example of using Hadoop streaming to access Zebra data. It loads data from foo table using Zebra's TableInputFormat and then writes the data into output using default TextOutputFormat.

$ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output output -mapper 'cat' -inputformat org.apache.hadoop.zebra.mapred.TableInputFormat

More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its records. Currently, when Zebra's TableInputFormat is used for input, the user script sees each line containing " key_if_any\tTuple.toString() ". We plan to generate CSV format representation of our Pig tuples. To this end, we plan to do the following:

1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override its toString() method to present the data into CSV format.

2) On Zebra side, the tuple factory should be changed to create ZebraTuple objects, instead of DefaultTuple objects.

Note that we can only support streaming on the input side - ability to use streaming to read data from Zebra tables. For the output side, the streaming support is not feasible, since the streaming mapper or reducer only emits "Text\tText", the output collector has no way of knowing how to convert this to (BytesWritable,Tuple).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-1104.patch
04/Dec/09 23:42
36 kB
Chao Wang

Activity

People

Assignee:: Chao Wang

Reporter:: Chao Wang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Nov/09 22:43

Updated:: 24/Mar/10 22:15

Resolved:: 07/Dec/09 22:07