[SPARK-20055] Documentation for CSV datasets in SQL programming guide - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.3.0
Component/s: Documentation
Labels:
None

Description

I guess things commonly used and important are documented there rather than documenting everything and every option in the programming guide - http://spark.apache.org/docs/latest/sql-programming-guide.html.

It seems JSON datasets http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets are documented whereas CSV datasets are not.

Nowadays, they are pretty similar in APIs and options. Some options are notable for both, In particular, ones such as wholeFile. Moreover, several options such as inferSchema and header are important in CSV that affect the type/column name of data.

In that sense, I think we might better document CSV datasets with some examples too because I believe reading CSV is pretty much common use cases.

Also, I think we could also leave some pointers for options of API documentations for both (rather than duplicating the documentation).

So, my suggestion is,

Add CSV Datasets section.
Add links for options for both JSON and CSV that point each API documentation
Fix trivial minor fixes together in both sections.

Attachments

Issue Links

links to

[Github] Pull Request #19429 (jomach)

[Github] Pull Request #19485 (jomach)

Activity

People

Assignee:: Jorge Machado

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Mar/17 08:28

Updated:: 12/Dec/22 18:10

Resolved:: 22/Oct/18 18:34