Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4561

PySparkSQL's Row.asDict() should convert nested rows to dictionaries

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.5.0
    • PySpark, SQL
    • None

    Description

      In PySpark, you can call {{.asDict
      ()}} on a SparkSQL Row to convert it to a dictionary. Unfortunately, though, this does not convert nested rows to dictionaries. For example:

      >>> sqlContext.sql("select results from results").first()
      Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), Row(time=3.239), Row(time=3.149)])
      >>> sqlContext.sql("select results from results").first().asDict()
      {u'results': [(3.762,),
        (3.47,),
        (3.559,),
        (3.458,),
        (3.229,),
        (3.21,),
        (3.166,),
        (3.276,),
        (3.239,),
        (3.149,)]}
      

      Actually, it looks like the nested fields are just left as Rows (IPython's fancy display logic obscured this in my first example):

      >>> Row(results=[Row(time=1), Row(time=2)]).asDict()
      {'results': [Row(time=1), Row(time=2)]}
      

      Here's the output I'd expect:

      >>> Row(results=[Row(time=1), Row(time=2)])
      {'results' : [{'time': 1}, {'time': 2}]}
      

      I ran into this issue when trying to use Pandas dataframes to display nested data that I queried from Spark SQL.

      Attachments

        Issue Links

          Activity

            People

              davies Davies Liu
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: