Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37697

Make it easier to convert numpy arrays to Spark Dataframes

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.2
    • None
    • PySpark
    • None

    Description

      Make it easier to convert numpy arrays to dataframes.

      Often we receive errors:

       

      df = spark.createDataFrame(numpy.arange(10))
      Can not infer schema for type: <class 'numpy.int64'>
      

       

      OR

      df = spark.createDataFrame(numpy.arange(10.))
      Can not infer schema for type: <class 'numpy.float64'>
      

       

      Today (Spark 3.x) we have to:

      spark.createDataFrame(pd.DataFrame(numpy.arange(10.))) 

      Make this easier with a direct conversion from Numpy arrays to Spark Dataframes.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            douglas.moore@databricks.com Douglas Moore

            Dates

              Created:
              Updated:

              Slack

                Issue deployment