Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.0
-
None
Description
I guess things commonly used and important are documented there rather than documenting everything and every option in the programming guide - http://spark.apache.org/docs/latest/sql-programming-guide.html.
It seems JSON datasets http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets are documented whereas CSV datasets are not.
Nowadays, they are pretty similar in APIs and options. Some options are notable for both, In particular, ones such as wholeFile. Moreover, several options such as inferSchema and header are important in CSV that affect the type/column name of data.
In that sense, I think we might better document CSV datasets with some examples too because I believe reading CSV is pretty much common use cases.
Also, I think we could also leave some pointers for options of API documentations for both (rather than duplicating the documentation).
So, my suggestion is,
- Add CSV Datasets section.
- Add links for options for both JSON and CSV that point each API documentation
- Fix trivial minor fixes together in both sections.