Hive
  1. Hive
  2. HIVE-259

Add PERCENTILE aggregate function

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Compute atleast 25, 50, 75th percentiles

      1. Percentile.xlsx
        36 kB
        Jerome Boulon
      2. jb2.txt
        0.2 kB
        Jerome Boulon
      3. HIVE-259-3.patch
        14 kB
        Jerome Boulon
      4. HIVE-259-2.patch
        16 kB
        Jerome Boulon
      5. HIVE-259.patch
        9 kB
        Jerome Boulon
      6. HIVE-259.5.patch
        26 kB
        Zheng Shao
      7. HIVE-259.4.patch
        27 kB
        Zheng Shao
      8. HIVE-259.1.patch
        5 kB
        Zheng Shao

        Issue Links

          Activity

          Hide
          Edward Capriolo added a comment -

          95% percentile is very often used in Internet Service Provider billing that might be useful.

          The percentile calculation is a sort and then picking an element. The syntax could be like:

          • PERCENTILE(column, .99)
          • PERCENTILE(column, .50)

          In this manner you could do any percentile.

          Show
          Edward Capriolo added a comment - 95% percentile is very often used in Internet Service Provider billing that might be useful. The percentile calculation is a sort and then picking an element. The syntax could be like: PERCENTILE(column, .99) PERCENTILE(column, .50) In this manner you could do any percentile.
          Hide
          Carl Steinbach added a comment -

          This would be a very useful function to have.

          For the sake of completeness (and without much additional effort) it would be nice to provide both PERCENTILE_DISC and PERCENTILE_CONT.

          PERCENTILE_CONT: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions110.htm
          PERCENTILE_DISC: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions111.htm

          Show
          Carl Steinbach added a comment - This would be a very useful function to have. For the sake of completeness (and without much additional effort) it would be nice to provide both PERCENTILE_DISC and PERCENTILE_CONT. PERCENTILE_CONT: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions110.htm PERCENTILE_DISC: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions111.htm
          Hide
          Todd Lipcon added a comment -

          An easy way to do this that would work for a ton of data sets would to be essentially do counting sort. If you have only a few thousand distinct values in the column to be analyzed, just make a hashtable, count up how many you see, and then in the single reducer use the histogram to figure out the percentile. This should work great for datasets like age, and even for sets like "number of days since user signed up". For sets that are truly continuous, would be useful when combined with a binning UDF to discretize it.

          Sadly it's not general case, but would be an easy first step.

          Show
          Todd Lipcon added a comment - An easy way to do this that would work for a ton of data sets would to be essentially do counting sort. If you have only a few thousand distinct values in the column to be analyzed, just make a hashtable, count up how many you see, and then in the single reducer use the histogram to figure out the percentile. This should work great for datasets like age, and even for sets like "number of days since user signed up". For sets that are truly continuous, would be useful when combined with a binning UDF to discretize it. Sadly it's not general case, but would be an easy first step.
          Hide
          Zheng Shao added a comment -

          This is a good first step. We can provide some UDFs to "bucketize" the values first in case the user needs it.

          Show
          Zheng Shao added a comment - This is a good first step. We can provide some UDFs to "bucketize" the values first in case the user needs it.
          Hide
          Jerome Boulon added a comment -

          It will also be good to be able to ask for more than one PERCENTILE(column, .99) with only one single structure in memory
          ex: select PERCENTILE(column, .99), PERCENTILE(column, .50) from myTable;

          Show
          Jerome Boulon added a comment - It will also be good to be able to ask for more than one PERCENTILE(column, .99) with only one single structure in memory ex: select PERCENTILE(column, .99), PERCENTILE(column, .50) from myTable;
          Hide
          Carl Steinbach added a comment -

          @Jerome: Agreed. Allowing sort results to be shared by multiple functions (like in the following example) is key to supporting analytic functions efficiently.

          SELECT department_id,
             PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) 
                "Median cont",
             PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) 
                "Median disc"
             FROM employees GROUP BY department_id;
          
          Show
          Carl Steinbach added a comment - @Jerome: Agreed. Allowing sort results to be shared by multiple functions (like in the following example) is key to supporting analytic functions efficiently. SELECT department_id, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) "Median cont" , PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) "Median disc" FROM employees GROUP BY department_id;
          Hide
          Jerome Boulon added a comment -

          First iteration for percentile (tested using Hive trunk and Hadoop 0.18.3):
          usage:
          CREATE TEMPORARY FUNCTION percentile AS 'org.apache.hadoop.hive.ql.udf.Percentile';
          select percentile(myColumn,"25,50,99") from MyTable;

          • How can I share the state object cross functions?
          Show
          Jerome Boulon added a comment - First iteration for percentile (tested using Hive trunk and Hadoop 0.18.3): usage: CREATE TEMPORARY FUNCTION percentile AS 'org.apache.hadoop.hive.ql.udf.Percentile'; select percentile(myColumn,"25,50,99") from MyTable; How can I share the state object cross functions?
          Hide
          Zheng Shao added a comment -

          Jerome, it seems to me that the best data structure for counting is a HashMap, which allows near-constant-time insertion, find, and insertion. When we "terminate" we can get the entries and sort them but that cost should be small (it's one-time cost and the number of unique items won't be too big - users should have used "round" to shrink the number of unique numbers).

          It seems currently we are paying log cost for each find, and O cost for each insertion.

          Does that make sense?

          For sharing the state object, we can just declare the state class as public static.

          Show
          Zheng Shao added a comment - Jerome, it seems to me that the best data structure for counting is a HashMap, which allows near-constant-time insertion, find, and insertion. When we "terminate" we can get the entries and sort them but that cost should be small (it's one-time cost and the number of unique items won't be too big - users should have used "round" to shrink the number of unique numbers). It seems currently we are paying log cost for each find, and O cost for each insertion. Does that make sense? For sharing the state object, we can just declare the state class as public static.
          Hide
          Todd Lipcon added a comment -

          Agreed re HashMap. Also, there should be some kind of setting that limits how much RAM gets used up. In a later iteration we could do adaptive histogramming once we hit the limit. In this version we should just throw up our hands and fail with a message that says the user needs to discretize harder.

          Show
          Todd Lipcon added a comment - Agreed re HashMap. Also, there should be some kind of setting that limits how much RAM gets used up. In a later iteration we could do adaptive histogramming once we hit the limit. In this version we should just throw up our hands and fail with a message that says the user needs to discretize harder.
          Hide
          Jerome Boulon added a comment -

          Didn't know that we can use an Hash on the state Object ...
          Is there any limitation on what can be used on the state object or can we use any java Object?
          Also how is the state serialized between Map and Reduce?

          Show
          Jerome Boulon added a comment - Didn't know that we can use an Hash on the state Object ... Is there any limitation on what can be used on the state object or can we use any java Object? Also how is the state serialized between Map and Reduce?
          Hide
          Zheng Shao added a comment -

          Jerome, I did a skeleton of the code to use HashMap. Do you want to start from there and add what is missing?

          Show
          Zheng Shao added a comment - Jerome, I did a skeleton of the code to use HashMap. Do you want to start from there and add what is missing?
          Hide
          Jerome Boulon added a comment -

          Sure, with Map support it's much simple

          Show
          Jerome Boulon added a comment - Sure, with Map support it's much simple
          Hide
          Zheng Shao added a comment -

          > Is there any limitation on what can be used on the state object or can we use any java Object?
          We support primitive classes, HashMap (translated into map<> type in Hive), ArrayList (array type in Hive), and any simple struct-like classes (struct type in Hive).
          We support arbitrary levels of nesting, but no recursive types.

          > Also how is the state serialized between Map and Reduce?
          We use SerDe (see SerDe.serialize(...) ) to serialize/deserialize the objects, as well as translations between objects that have the same "type" (see ObjectInspector and ObjectInspectorConverters).

          Show
          Zheng Shao added a comment - > Is there any limitation on what can be used on the state object or can we use any java Object? We support primitive classes, HashMap (translated into map<> type in Hive), ArrayList (array type in Hive), and any simple struct-like classes (struct type in Hive). We support arbitrary levels of nesting, but no recursive types. > Also how is the state serialized between Map and Reduce? We use SerDe (see SerDe.serialize(...) ) to serialize/deserialize the objects, as well as translations between objects that have the same "type" (see ObjectInspector and ObjectInspectorConverters).
          Hide
          Jerome Boulon added a comment -

          Percentile function

          Show
          Jerome Boulon added a comment - Percentile function
          Hide
          Jerome Boulon added a comment -

          Percentile test file + validation using Excep Percentile function:
          CREATE TABLE JB2
          (
          duration bigint,
          code string
          )
          ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n'
          STORED AS TEXTFILE;

          LOAD DATA LOCAL INPATH '/jb2.txt' INTO TABLE JB2;

          Result:
          hive> select percentile(duration,"25,50,99") from JB2;
          Ended Job = job_201002201654_0006
          OK
          [14.0,33.0,416.4000000000001]
          Time taken: 36.261 seconds

          hive> select code,percentile(duration,"25,50,99") from JB2 group by code;
          Ended Job = job_201002201654_0007
          OK
          a [2.0,17.5,427.2299999999999]
          b [22.75,44.5,345.84999999999997]
          c [18.0,29.0,58.760000000000005]
          Time taken: 23.419 seconds
          hive> quit;

          Show
          Jerome Boulon added a comment - Percentile test file + validation using Excep Percentile function: CREATE TABLE JB2 ( duration bigint, code string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/jb2.txt' INTO TABLE JB2; Result: hive> select percentile(duration,"25,50,99") from JB2; Ended Job = job_201002201654_0006 OK [14.0,33.0,416.4000000000001] Time taken: 36.261 seconds hive> select code,percentile(duration,"25,50,99") from JB2 group by code; Ended Job = job_201002201654_0007 OK a [2.0,17.5,427.2299999999999] b [22.75,44.5,345.84999999999997] c [18.0,29.0,58.760000000000005] Time taken: 23.419 seconds hive> quit;
          Hide
          Jerome Boulon added a comment -

          Percentile function.
          Usage: select code,percentile(MyColumnB,"<P1,P2,P3,Px>") from <MyTable> group by <myColumn>;

          Show
          Jerome Boulon added a comment - Percentile function. Usage: select code,percentile(MyColumnB,"<P1,P2,P3,Px>") from <MyTable> group by <myColumn>;
          Hide
          Carl Steinbach added a comment -

          Please fix the new Checkstyle errors in UDAFPercentile.java:

          35: Missing a Javadoc comment.
          39: Missing a Javadoc comment.
          39:10: 'public' modifier out of order with the JLS suggestions.
          41: Missing a Javadoc comment.
          41:12: 'public' modifier out of order with the JLS suggestions.
          42:15: Variable 'initDone' must be private and have accessor methods.
          43:7: Declaring variables, return values or parameters of type 'HashMap' is not allowed.
          43:35: Variable 'counts' must be private and have accessor methods.
          44:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed.
          44:26: Variable 'percentiles' must be private and have accessor methods.
          47: Missing a Javadoc comment.
          47:12: 'public' modifier out of order with the JLS suggestions.
          56:11: Variable 'state' must be private and have accessor methods.
          82:43: Name '_percentiles' must match pattern '^[a-z][a-zA-Z0-9]*$'.
          85:28: Expression can be simplified.
          105:39: ')' is preceded with whitespace.
          117:26: Expression can be simplified.
          125:65: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
          129:12: Name 'CRN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
          130:12: Name 'FRN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
          164:12: Declaring variables, return values or parameters of type 'ArrayList' is not allowed.
          173: Line is longer than 100 characters.
          184:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed.
          188:12: Name 'N' must match pattern '^[a-z][a-zA-Z0-9]*$'.
          189:14: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
          191:16: Name 'P' must match pattern '^[a-z][a-zA-Z0-9]*$'.

          Show
          Carl Steinbach added a comment - Please fix the new Checkstyle errors in UDAFPercentile.java: 35: Missing a Javadoc comment. 39: Missing a Javadoc comment. 39:10: 'public' modifier out of order with the JLS suggestions. 41: Missing a Javadoc comment. 41:12: 'public' modifier out of order with the JLS suggestions. 42:15: Variable 'initDone' must be private and have accessor methods. 43:7: Declaring variables, return values or parameters of type 'HashMap' is not allowed. 43:35: Variable 'counts' must be private and have accessor methods. 44:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 44:26: Variable 'percentiles' must be private and have accessor methods. 47: Missing a Javadoc comment. 47:12: 'public' modifier out of order with the JLS suggestions. 56:11: Variable 'state' must be private and have accessor methods. 82:43: Name '_percentiles' must match pattern '^ [a-z] [a-zA-Z0-9] *$'. 85:28: Expression can be simplified. 105:39: ')' is preceded with whitespace. 117:26: Expression can be simplified. 125:65: Name 'RN' must match pattern '^ [a-z] [a-zA-Z0-9] *$'. 129:12: Name 'CRN' must match pattern '^ [a-z] [a-zA-Z0-9] *$'. 130:12: Name 'FRN' must match pattern '^ [a-z] [a-zA-Z0-9] *$'. 164:12: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 173: Line is longer than 100 characters. 184:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 188:12: Name 'N' must match pattern '^ [a-z] [a-zA-Z0-9] *$'. 189:14: Name 'RN' must match pattern '^ [a-z] [a-zA-Z0-9] *$'. 191:16: Name 'P' must match pattern '^ [a-z] [a-zA-Z0-9] *$'.
          Hide
          Jerome Boulon added a comment -

          @Carl: How did you get this list?

          Also, I'm not sure to understand this:

          Why HashMap and ArrayList are not allowed if supported??

          43:7: Declaring variables, return values or parameters of type 'HashMap' is not allowed.
          44:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed.
          164:12: Declaring variables, return values or parameters of type 'ArrayList' is not allowed.
          184:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed.

          Show
          Jerome Boulon added a comment - @Carl: How did you get this list? Also, I'm not sure to understand this: Why HashMap and ArrayList are not allowed if supported?? 43:7: Declaring variables, return values or parameters of type 'HashMap' is not allowed. 44:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 164:12: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 184:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed.
          Hide
          Alex Loddengaard added a comment -

          Hey Jerome,

          I assume it's because you're supposed to use the interface type (e.g. Map or List) for return types, parameter types, and declaring variables.

          Correct me if I'm wrong, those of you more knowledgeable about Hive's checkstyle .

          Alex

          Show
          Alex Loddengaard added a comment - Hey Jerome, I assume it's because you're supposed to use the interface type (e.g. Map or List) for return types, parameter types, and declaring variables. Correct me if I'm wrong, those of you more knowledgeable about Hive's checkstyle . Alex
          Hide
          Carl Steinbach added a comment -

          How did you get this list?

          Run 'ant checkstyle'. The list of violations gets dumped to build/checkstyle/checkstyle-errors.txt.

          Why HashMap and ArrayList are not allowed if supported?

          You're allowed to use ArrayList and HashMap, but you're supposed to refer
          to instances of these classes using the interface (List or Map) instead of the
          concrete type, e.g.

          Map<String, String> myMap = new HashMap<String, String>();
          
          public List<String> getStringList() {
             return new ArrayList<String>();
          }
          
          Show
          Carl Steinbach added a comment - How did you get this list? Run 'ant checkstyle'. The list of violations gets dumped to build/checkstyle/checkstyle-errors.txt. Why HashMap and ArrayList are not allowed if supported? You're allowed to use ArrayList and HashMap, but you're supposed to refer to instances of these classes using the interface (List or Map) instead of the concrete type, e.g. Map< String , String > myMap = new HashMap< String , String >(); public List< String > getStringList() { return new ArrayList< String >(); }
          Show
          Zheng Shao added a comment - Also see http://wiki.apache.org/hadoop/Hive/HowToContribute#Coding_Convention
          Hide
          Zheng Shao added a comment -

          The test cases looks a bit too trivial or the results have problems? They always return the same number for the 3 different percentile values.

          Show
          Zheng Shao added a comment - The test cases looks a bit too trivial or the results have problems? They always return the same number for the 3 different percentile values.
          Hide
          Zheng Shao added a comment -

          1. We are converting "25,50,99" to ArrayList<Integer>. Why don't we directly accept an int array (or a double array to allow 99.9).

          In the query, the user can say:

          SELECT percentile(mycol, array(25, 50, 99) FROM mytable;

          2. Get rid of State.initDone. We can set "ArrayList<Integer> percentiles" to null first. That saves some space in memory as well as network when we transfer the state from mapper to reducer.

          3. In Java, variable names should be lowercased.

          4. We should change the test case to be non-trivial.

          Show
          Zheng Shao added a comment - 1. We are converting "25,50,99" to ArrayList<Integer>. Why don't we directly accept an int array (or a double array to allow 99.9). In the query, the user can say: SELECT percentile(mycol, array(25, 50, 99) FROM mytable; 2. Get rid of State.initDone. We can set "ArrayList<Integer> percentiles" to null first. That saves some space in memory as well as network when we transfer the state from mapper to reducer. 3. In Java, variable names should be lowercased. 4. We should change the test case to be non-trivial.
          Hide
          Jerome Boulon added a comment -
          • From my point of view, changing variable access to private in the state object will not make the code more readable ...
          • I'll change all variables to be lowerCase to match java style, current variable's name are based on Oracle definition.

          @Zheng - I'm not using an ArrayList<Integer> but a String to avoid unnecessary object creation (for every single row) ... would even be better if the constructor could have been used but I haven't found how to do that. If we care about 1 extra empty arrayList per mapper/spill in memory then we should care about creating (1 ArrayList + 1 Integer Object per percentile) per row.

          @Zheng - Regarding the test case that what I add in mind when I asked you, howto create my own table and that exactly the reason why I post Jb2.* files

          Show
          Jerome Boulon added a comment - From my point of view, changing variable access to private in the state object will not make the code more readable ... I'll change all variables to be lowerCase to match java style, current variable's name are based on Oracle definition. @Zheng - I'm not using an ArrayList<Integer> but a String to avoid unnecessary object creation (for every single row) ... would even be better if the constructor could have been used but I haven't found how to do that. If we care about 1 extra empty arrayList per mapper/spill in memory then we should care about creating (1 ArrayList + 1 Integer Object per percentile) per row. @Zheng - Regarding the test case that what I add in mind when I asked you, howto create my own table and that exactly the reason why I post Jb2.* files
          Hide
          Jerome Boulon added a comment -

          Can someone explain how can I create/populate a new table to be used by the ant test target?

          Show
          Jerome Boulon added a comment - Can someone explain how can I create/populate a new table to be used by the ant test target?
          Hide
          Carl Steinbach added a comment -

          @Jerome: take a look at ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java

          Show
          Carl Steinbach added a comment - @Jerome: take a look at ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
          Hide
          Zheng Shao added a comment -

          Hi Jerome, using ArrayList<Integer> won't cause unnecessary Object creation. We will just create a single ArrayList<Integer> and use it forever.
          Does that make sense?

          Show
          Zheng Shao added a comment - Hi Jerome, using ArrayList<Integer> won't cause unnecessary Object creation. We will just create a single ArrayList<Integer> and use it forever. Does that make sense?
          Hide
          Todd Lipcon added a comment -

          Doesn't the autoboxing of Integer types actually allocate objects? I think JVM only flyweights integers for very small ones (iirc only from -127 to 128)

          Show
          Todd Lipcon added a comment - Doesn't the autoboxing of Integer types actually allocate objects? I think JVM only flyweights integers for very small ones (iirc only from -127 to 128)
          Hide
          Jerome Boulon added a comment -

          Percentiles that match included test case

          Show
          Jerome Boulon added a comment - Percentiles that match included test case
          Hide
          Jerome Boulon added a comment -
          • use Double instead of Integer for percentile so we can ask for 99.999 percentile
          • checkstyle fix except State object
          • new test case
          Show
          Jerome Boulon added a comment - use Double instead of Integer for percentile so we can ask for 99.999 percentile checkstyle fix except State object new test case
          Hide
          Jerome Boulon added a comment -

          HIVE-259-3.patch

          Show
          Jerome Boulon added a comment - HIVE-259 -3.patch
          Hide
          Zheng Shao added a comment -

          This one fixes all checkstyle errors, and uses *Writable classes to avoid creating new objects as much as possible.

          Show
          Zheng Shao added a comment - This one fixes all checkstyle errors, and uses *Writable classes to avoid creating new objects as much as possible.
          Hide
          He Yongqiang added a comment -

          The code looks very good. Thanks for the code work, Jerome and Zheng!
          Just some minor comments:
          (1) I am not familiar with the exact definition of percentile function. Is the percentile()'s result must be a member of input data?
          (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? this is a small and can be ignored.
          In the beginning of new test case,
          DESCRIBE FUNCTION percentile;
          DESCRIBE FUNCTION EXTENDED percentile;
          appears two times.

          And this is a very good function to have, it will be great if we can update its usage to the wiki page or somewhere.

          Show
          He Yongqiang added a comment - The code looks very good. Thanks for the code work, Jerome and Zheng! Just some minor comments: (1) I am not familiar with the exact definition of percentile function. Is the percentile()'s result must be a member of input data? (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? this is a small and can be ignored. In the beginning of new test case, DESCRIBE FUNCTION percentile; DESCRIBE FUNCTION EXTENDED percentile; appears two times. And this is a very good function to have, it will be great if we can update its usage to the wiki page or somewhere.
          Hide
          Zheng Shao added a comment -

          We take the method recommended by NIST.

          See http://en.wikipedia.org/wiki/Percentile#Alternative_methods

          Show
          Zheng Shao added a comment - We take the method recommended by NIST. See http://en.wikipedia.org/wiki/Percentile#Alternative_methods
          Hide
          Zheng Shao added a comment -

          > (1) I am not familiar with the exact definition of percentile function. Is the percentile()'s result must be a member of input data?
          See the link above.

          > (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? this is a small and can be ignored.
          In the beginning of new test case,
          I think HashMap is better here. The reason is that the number of "iterate" is usually much higher than the number of unique numbers (the size of the HashMap). By using HashMap we reduce the cost of "iterate".

          > In the beginning of new test case, .. appears two times
          Fixed in HIVE-259.5.patch

          Show
          Zheng Shao added a comment - > (1) I am not familiar with the exact definition of percentile function. Is the percentile()'s result must be a member of input data? See the link above. > (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? this is a small and can be ignored. In the beginning of new test case, I think HashMap is better here. The reason is that the number of "iterate" is usually much higher than the number of unique numbers (the size of the HashMap). By using HashMap we reduce the cost of "iterate". > In the beginning of new test case, .. appears two times Fixed in HIVE-259 .5.patch
          Hide
          He Yongqiang added a comment -

          looks good, will test and commit.

          Show
          He Yongqiang added a comment - looks good, will test and commit.
          Hide
          He Yongqiang added a comment -

          Committed. Thanks for the hard work, Jerome Boulon and Zheng.

          Btw, i manually fixed a show_function.q diff. Please update the usage of percentile function on the wiki or somewhere.

          Show
          He Yongqiang added a comment - Committed. Thanks for the hard work, Jerome Boulon and Zheng. Btw, i manually fixed a show_function.q diff. Please update the usage of percentile function on the wiki or somewhere.
          Hide
          Ning Zhang added a comment -

          Hi Jerome and Zheng,

          Could any of you write the syntax and semantics of the percentile function in the wiki page (http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF or http://wiki.apache.org/hadoop/Hive/HiveUDFGuide)?

          Thanks,

          Show
          Ning Zhang added a comment - Hi Jerome and Zheng, Could any of you write the syntax and semantics of the percentile function in the wiki page ( http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF or http://wiki.apache.org/hadoop/Hive/HiveUDFGuide)? Thanks,
          Hide
          John Sichi added a comment -

          I couldn't see the point of having two competing UDF guide pages, so I renamed the XPath-specific one as such and linked it from the main one. Just housekeeping to reduce confusion; I did not actually add the percentile info.

          Show
          John Sichi added a comment - I couldn't see the point of having two competing UDF guide pages, so I renamed the XPath-specific one as such and linked it from the main one. Just housekeeping to reduce confusion; I did not actually add the percentile info.
          Hide
          John Sichi added a comment -

          PERCENTILE docs are still missing on the consolidated page:

          http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF

          Show
          John Sichi added a comment - PERCENTILE docs are still missing on the consolidated page: http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF

            People

            • Assignee:
              Jerome Boulon
              Reporter:
              Venky Iyer
            • Votes:
              3 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development