[PIG-200] Pig Performance Benchmarks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.2.0
Component/s: None
Labels:
None

Description

To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).

Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance

I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs.

We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc.

We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix).

I will update this JIRA with more details of current activities soon.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

pigmix2.patch
06/Jul/10 21:19
200 kB
Daniel Dai
pigmix_pig0.11.patch
14/Dec/11 19:26
194 kB
Dmitriy V. Ryaboy
PIG-200-0.12.patch
03/Apr/13 05:48
218 kB
Daniel Dai
pig-0.8.1-vs-0.9.0.png
27/Apr/12 00:38
8 kB
Jie Li
perf-0.6.patch
15/Mar/10 22:50
152 kB
Daniel Dai
perf.patch
04/Dec/08 22:52
153 kB
Alan Gates
perf.hadoop.patch
03/Aug/09 20:25
33 kB
Ying He
generate_data.pl
10/Jun/08 14:25
10 kB
Alan Gates

Issue Links

relates to

PIG-2661 Pig uses an extra job for loading data in Pigmix L9

Open

Activity

People

Assignee:: Alan Gates

Reporter:: Amir Youssefi

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 10/Apr/08 01:12

Updated:: 02/Oct/13 21:50

Resolved:: 26/Jan/09 20:21