Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21485

API Documentation for Spark SQL functions

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • Documentation, SQL
    • None

    Description

      It looks we can generate the documentation from ExpressionDescription and ExpressionInfo for Spark's SQL function documentation.

      I had some time to play with this so I just made a rough version - https://spark-test.github.io/sparksqldoc/

      Codes I used are as below :

      In pyspark shell:

      from collections import namedtuple
      
      ExpressionInfo = namedtuple("ExpressionInfo", "className usage name extended")
      
      jinfos = spark.sparkContext._jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listBuiltinFunctions()
      infos = []
      for jinfo in jinfos:
          name = jinfo.getName()
          usage = jinfo.getUsage()
          usage = usage.replace("_FUNC_", name) if usage is not None else usage
          extended = jinfo.getExtended()
          extended = extended.replace("_FUNC_", name) if extended is not None else extended
          infos.append(ExpressionInfo(
              className=jinfo.getClassName(),
              usage=usage,
              name=name,
              extended=extended))
      
      with open("index.md", 'w') as mdfile:
          strip = lambda s: "\n".join(map(lambda u: u.strip(), s.split("\n")))
          for info in sorted(infos, key=lambda i: i.name):
              mdfile.write("### %s\n\n" % info.name)
              if info.usage is not None:
                  mdfile.write("%s\n\n" % strip(info.usage))
              if info.extended is not None:
                  mdfile.write("```%s```\n\n" % strip(info.extended))
      

      This change had to be made first before running the codes above:

      +++ b/sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala
      @@ -17,9 +17,15 @@
      
       package org.apache.spark.sql.api.python
      
      +import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
      +import org.apache.spark.sql.catalyst.expressions.ExpressionInfo
       import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
       import org.apache.spark.sql.types.DataType
      
       private[sql] object PythonSQLUtils {
         def parseDataType(typeText: String): DataType = CatalystSqlParser.parseDataType(typeText)
      +
      +  def listBuiltinFunctions(): Array[ExpressionInfo] = {
      +    FunctionRegistry.functionSet.flatMap(f => FunctionRegistry.builtin.lookupFunction(f)).toArray
      +  }
       }
      

      And then, I ran this:

      mkdir docs
      echo "site_name: Spark SQL 2.3.0" >> mkdocs.yml
      echo "theme: readthedocs" >> mkdocs.yml
      mv index.md docs/index.md
      mkdocs serve
      

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: