Description
Currently, it seems function documentation is inconsistent and does not have examples (extend much.
For example, some functions have a bad indentation as below:
spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct; Function: approx_count_distinct Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus Usage: approx_count_distinct(expr) - Returns the estimated cardinality by HyperLogLog++. approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated cardinality by HyperLogLog++ with relativeSD, the maximum estimation error allowed. Extended Usage: No example for approx_count_distinct.
spark-sql> DESCRIBE FUNCTION EXTENDED count; Function: count Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count Usage: count(*) - Returns the total number of retrieved rows, including rows containing NULL values. count(expr) - Returns the number of rows for which the supplied expression is non-NULL. count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Extended Usage: No example for count.
whereas some do have a pretty one
spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx; Function: percentile_approx Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. percentile_approx(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate percentile array of column `col` at the given percentage array. Each value of the percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. Extended Usage: No example for percentile_approx.
Also, there are several inconsistent indentation, for example, FUNC(a,b) and FUNC(a, b) (note the indentation between arguments.
It'd be nicer if most of them have a good example with possible argument types.
Suggested format is as below for multiple line usage:
spark-sql> DESCRIBE FUNCTION EXTENDED rand; Function: rand Class: org.apache.spark.sql.catalyst.expressions.Rand Usage: rand() - Returns a random column with i.i.d. uniformly distributed values in [0, 1]. seed is given randomly. rand(seed) - Returns a random column with i.i.d. uniformly distributed values in [0, 1]. seed should be an integer/long/NULL literal. Extended Usage: > SELECT rand(); 0.9629742951434543 > SELECT rand(0); 0.8446490682263027 > SELECT rand(NULL); 0.8446490682263027
For single line usage:
spark-sql> DESCRIBE FUNCTION EXTENDED date_add; Function: date_add Class: org.apache.spark.sql.catalyst.expressions.DateAdd Usage: date_add(start_date, num_days) - Returns the date that is num_days after start_date. Extended Usage: > SELECT date_add('2016-07-30', 1); '2016-07-31'
Attachments
Issue Links
- is duplicated by
-
SPARK-17940 Typo in LAST function error message
- Resolved
- is related to
-
SPARK-17940 Typo in LAST function error message
- Resolved
- links to