Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15840

New csv reader does not "determine the input schema"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • PySpark, SQL
    • None

    Description

      When testing the new csv reader I found that it would not determine the input schema as is stated in the documentation.
      (I used this documentation: https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext )

      So either there is a bug in the implementation or in the documentation.

      This also means that things like dateFormat are ignore it seems like.

      Here's a quick test in pyspark (using Python3):

      a = spark.read.csv("/home/ernst/test.csv")
      a.printSchema()
      print(a.dtypes)
      a.show()

       root
        |-- _c0: string (nullable = true)
       [('_c0', 'string')]
       +---+
       |_c0|
       +---+
       |  1|
       |  2|
       |  3|
       |  4|
       +---+
      

      Attachments

        Issue Links

          Activity

            People

              hyukjin.kwon Hyukjin Kwon
              ernstp Ernst Sjöstrand
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: