Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.5.0
-
None
-
None
-
None
Description
The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism).
An example use case for top K:
Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp and we are looking for the top K commenter or the posts with the most comment.