[SPARK-21485] API Documentation for Spark SQL functions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: Documentation, SQL
Labels:
None

Description

It looks we can generate the documentation from ExpressionDescription and ExpressionInfo for Spark's SQL function documentation.

I had some time to play with this so I just made a rough version - https://spark-test.github.io/sparksqldoc/

Codes I used are as below :

In pyspark shell:

from collections import namedtuple

ExpressionInfo = namedtuple("ExpressionInfo", "className usage name extended")

jinfos = spark.sparkContext._jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listBuiltinFunctions()
infos = []
for jinfo in jinfos:
    name = jinfo.getName()
    usage = jinfo.getUsage()
    usage = usage.replace("_FUNC_", name) if usage is not None else usage
    extended = jinfo.getExtended()
    extended = extended.replace("_FUNC_", name) if extended is not None else extended
    infos.append(ExpressionInfo(
        className=jinfo.getClassName(),
        usage=usage,
        name=name,
        extended=extended))

with open("index.md", 'w') as mdfile:
    strip = lambda s: "\n".join(map(lambda u: u.strip(), s.split("\n")))
    for info in sorted(infos, key=lambda i: i.name):
        mdfile.write("### %s\n\n" % info.name)
        if info.usage is not None:
            mdfile.write("%s\n\n" % strip(info.usage))
        if info.extended is not None:
            mdfile.write("```%s```\n\n" % strip(info.extended))

This change had to be made first before running the codes above:

+++ b/sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala
@@ -17,9 +17,15 @@

 package org.apache.spark.sql.api.python

+import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.expressions.ExpressionInfo
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.types.DataType

 private[sql] object PythonSQLUtils {
   def parseDataType(typeText: String): DataType = CatalystSqlParser.parseDataType(typeText)
+
+  def listBuiltinFunctions(): Array[ExpressionInfo] = {
+    FunctionRegistry.functionSet.flatMap(f => FunctionRegistry.builtin.lookupFunction(f)).toArray
+  }
 }

And then, I ran this:

mkdir docs
echo "site_name: Spark SQL 2.3.0" >> mkdocs.yml
echo "theme: readthedocs" >> mkdocs.yml
mv index.md docs/index.md
mkdocs serve

Attachments

Issue Links

relates to

SPARK-14764 Spark SQL documentation should be more precise about which SQL features it supports

Resolved

links to

[Github] Pull Request #18702 (HyukjinKwon)

[Github] Pull Request #18749 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Jul/17 07:15

Updated:: 12/Dec/22 18:10

Resolved:: 26/Jul/17 16:38