[TEZ-1608] TopK example - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.5.0
Fix Version/s: None
Component/s: None
Labels:
None

Target Version/s:

0.8.6

Description

The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism).

An example use case for top K:

Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp and we are looking for the top K commenter or the posts with the most comment.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TEZ-1608-1.patch
23/Sep/14 11:19
25 kB
Krisztian Horvath
TEZ-1608-2.patch
24/Sep/14 13:44
27 kB
Krisztian Horvath
TEZ-1608-3.patch
23/Oct/14 09:00
26 kB
Krisztian Horvath

Activity

People

Assignee:: Krisztian Horvath

Reporter:: Janos Matyas

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 22/Sep/14 08:39

Updated:: 14/Mar/17 03:40