Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21358

Argument of repartitionandsortwithinpartitions at pyspark

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.1
    • Fix Version/s: 2.3.0
    • Component/s: Documentation, Examples
    • Labels:
      None

      Description

      In rdd.py, implementation of repartitionandsortwithinpartitions is below.

       def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash,
                                                 ascending=True, keyfunc=lambda x: x):
      

      And at document, there is following sample script.

              >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)])
              >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2)
      

      The third argument (ascending) expected to be boolean, so following script is better, I think.

              >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)])
              >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True)
      

        Attachments

          Activity

            People

            • Assignee:
              hayashidac chie hayashida
              Reporter:
              hayashidac chie hayashida
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: