Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12467

Get rid of sorting in Row's constructor in pyspark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 1.5.2, 2.2.0
    • None
    • PySpark, SQL
    • None

    Description

      Current implementation of Row's _new_ sorts columns by name
      First of all there is no obvious reason to sort, second, if one converts dataframe to rdd and than back to dataframe, order of column changes. While this is not a bug, nevetheless it makes looking at the data really inconvenient.

      def _new_(self, *args, **kwargs):
      if args and kwargs:
      raise ValueError("Can not use both args "
      "and kwargs to create Row")
      if args:

      1. create row class or objects
        return tuple._new_(self, args)

      elif kwargs:

      1. create row objects
        names = sorted(kwargs.keys()) # just get rid of sorting here!!!
        row = tuple._new_(self, [kwargs[n] for n in names])
        row._fields_ = names
        return row

      else:
      raise ValueError("No args or kwargs")

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              imachabeli Irakli Machabeli
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: