Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25512

Using RowNumbers in SparkR Dataframe

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Critical
    • Resolution: Invalid
    • 2.3.1
    • None
    • SparkR
    • None

    Description

      Hi,

      I have a use case , where I have a  SparkR  dataframe and i want to iterate over the dataframe in a for loop using the row numbers  of the dataframe. Is it possible?

      Only solution I have now is to collect() the SparkR dataframe in R dataframe , which brings the entire dataframe on Driver node and then iterate over it using row numbers. But as the for loop executes only on driver node, I don't get the advantage of parallel processing in Spark which was the whole purpose of using Spark. Please Help.

      Thank You,

      Asif Khan

      Attachments

        Activity

          People

            Unassigned Unassigned
            asif3051@gmail.com Asif Khan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: