[SPARK-15840] New csv reader does not "determine the input schema" - ASF JIRA

XML

Word

Printable

JSON

When testing the new csv reader I found that it would not determine the input schema as is stated in the documentation.
(I used this documentation: https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext )

So either there is a bug in the implementation or in the documentation.

This also means that things like dateFormat are ignore it seems like.

Here's a quick test in pyspark (using Python3):

a = spark.read.csv("/home/ernst/test.csv")
a.printSchema()
print(a.dtypes)
a.show()

 root
  |-- _c0: string (nullable = true)
 [('_c0', 'string')]
 +---+
 |_c0|
 +---+
 |  1|
 |  2|
 |  3|
 |  4|
 +---+

blocks

SPARK-12420 Have a built-in CSV data source implementation

links to

[Github] Pull Request #13576 (HyukjinKwon)