Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.5.0
    • None
    • None
    • None

    Description

      The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism).

      An example use case for top K:

      Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp and we are looking for the top K commenter or the posts with the most comment.

      Attachments

        1. TEZ-1608-1.patch
          25 kB
          Krisztian Horvath
        2. TEZ-1608-2.patch
          27 kB
          Krisztian Horvath
        3. TEZ-1608-3.patch
          26 kB
          Krisztian Horvath

        Activity

          People

            keyki Krisztian Horvath
            matyix Janos Matyas
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: