Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-821

simulate NTILE(n) , rank() functionality in pig

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.2.0
    • 0.11
    • impl
    • None
    • mithril gold -gateway 4000

    Description

      Hi,

      I came across a job which has some processing which I cant seem to get easily over-the-counter from pig.
      These are NTILE() /rank() operations available in oracle.

      While I am trying to write a UDF, that is not working out too well for me yet..

      I have a ntile over (partititon by x, y, z order by a desc, b desc) operation to be done in pig scripts.
      Is there a default function in pig scripting which can do this?

      For example, lets consider a simple example at http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions091.htm
      So here, how would we ideally substitute NTILE() with? any pig counterpart function/udf?

      SELECT last_name, salary, NTILE(4) OVER (ORDER BY salary DESC)
      AS quartile FROM employees
      WHERE department_id = 100;

      LAST_NAME SALARY QUARTILE
      ------------------------- ---------- ----------
      Greenberg 12000 1
      Faviet 9000 1
      Chen 8200 2
      Urman 7800 2
      Sciarra 7700 3
      Popp 6900 4

      In real case, i have ntile over multiple columns, so ideal way to find histograms/boundary/spitting out the bucket number is needed.

      Similarly a pig function is required for rank() over(partition by a,b,c order by d desc) as e

      Please let me know soon.

      Thanks & Regards,
      /Rekha

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rekhajos Rekha
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: