Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29041

Allow createDataFrame to accept bytes as binary type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.4, 3.0.0
    • 3.0.0
    • PySpark
    • None

    Description

      spark.createDataFrame([[b"abcd"]], "col binary")
      

      simply fails as below:

      in Python 3

      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
          rdd, schema = self._createFromLocal(map(prepare, data), schema)
        File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
          data = list(data)
        File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
          verify_func(obj)
        File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify
          verify_value(obj)
        File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
          verifier(v)
        File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
          verify_value(obj)
        File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
          verify_acceptable_types(obj)
        File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
          % (dataType, obj, type(obj))))
      TypeError: field col: BinaryType can not accept object b'abcd' in type <class 'bytes'>
      

      in Python 2:

      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
          rdd, schema = self._createFromLocal(map(prepare, data), schema)
        File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
          data = list(data)
        File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
          verify_func(obj)
        File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
          verify_value(obj)
        File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
          verifier(v)
        File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
          verify_value(obj)
        File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
          verify_acceptable_types(obj)
        File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
          % (dataType, obj, type(obj))))
      TypeError: field col: BinaryType can not accept object 'abcd' in type <type 'str'>
      

      bytes should also be able to accepted as binary type

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: