Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41284 Feature parity: I/O in Spark Connect
  3. SPARK-41817

SparkSession.read support reading with schema

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", line 122, in pyspark.sql.connect.readwriter.DataFrameReader.load
      Failed example:
          with tempfile.TemporaryDirectory() as d:
              # Write a DataFrame into a CSV file with a header
              df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
              df.write.option("header", True).mode("overwrite").format("csv").save(d)
      
              # Read the CSV file as a DataFrame with 'nullValue' option set to 'Hyukjin Kwon',
              # and 'header' option set to `True`.
              df = spark.read.load(
                  d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", header=True)
              df.printSchema()
              df.show()
      Exception raised:
          Traceback (most recent call last):
            File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run
              exec(compile(example.source, filename, "single",
            File "<doctest pyspark.sql.connect.readwriter.DataFrameReader.load[1]>", line 10, in <module>
              df.printSchema()
            File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1039, in printSchema
              print(self._tree_string())
            File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1035, in _tree_string
              query = self._plan.to_proto(self._session.client)
            File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 92, in to_proto
              plan.root.CopyFrom(self.plan(session))
            File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/plan.py", line 245, in plan
              plan.read.data_source.schema = self.schema
          TypeError: bad argument type for built-in operation 

      Attachments

        Issue Links

          Activity

            People

              techaddict Sandeep Singh
              techaddict Sandeep Singh
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: