Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20055

Documentation for CSV datasets in SQL programming guide

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Documentation
    • None

    Description

      I guess things commonly used and important are documented there rather than documenting everything and every option in the programming guide - http://spark.apache.org/docs/latest/sql-programming-guide.html.

      It seems JSON datasets http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets are documented whereas CSV datasets are not.

      Nowadays, they are pretty similar in APIs and options. Some options are notable for both, In particular, ones such as wholeFile. Moreover, several options such as inferSchema and header are important in CSV that affect the type/column name of data.

      In that sense, I think we might better document CSV datasets with some examples too because I believe reading CSV is pretty much common use cases.

      Also, I think we could also leave some pointers for options of API documentations for both (rather than duplicating the documentation).

      So, my suggestion is,

      • Add CSV Datasets section.
      • Add links for options for both JSON and CSV that point each API documentation
      • Fix trivial minor fixes together in both sections.

      Attachments

        Activity

          People

            jomach Jorge Machado
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: