Description
When testing the new csv reader I found that it would not determine the input schema as is stated in the documentation.
(I used this documentation: https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext )
So either there is a bug in the implementation or in the documentation.
This also means that things like dateFormat are ignore it seems like.
Here's a quick test in pyspark (using Python3):
a = spark.read.csv("/home/ernst/test.csv")
a.printSchema()
print(a.dtypes)
a.show()
root |-- _c0: string (nullable = true) [('_c0', 'string')] +---+ |_c0| +---+ | 1| | 2| | 3| | 4| +---+
Attachments
Issue Links
- blocks
-
SPARK-12420 Have a built-in CSV data source implementation
- Resolved
- links to