[SPARK-23074] Dataframe-ified zipwithindex - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:

Description

Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types.{LongType, StructField, StructType}
import org.apache.spark.sql.Row


def dfZipWithIndex(
  df: DataFrame,
  offset: Int = 1,
  colName: String = "id",
  inFront: Boolean = true
) : DataFrame = {
  df.sqlContext.createDataFrame(
    df.rdd.zipWithIndex.map(ln =>
      Row.fromSeq(
        (if (inFront) Seq(ln._2 + offset) else Seq())
          ++ ln._1.toSeq ++
        (if (inFront) Seq() else Seq(ln._2 + offset))
      )
    ),
    StructType(
      (if (inFront) Array(StructField(colName,LongType,false)) else Array[StructField]()) 
        ++ df.schema.fields ++ 
      (if (inFront) Array[StructField]() else Array(StructField(colName,LongType,false)))
    )
  ) 
}

credits: https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex

Attachments

Issue Links

is related to

SPARK-24042 High-order function: zip_with_index

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Ruslan Dautkhanov

Votes:: 6 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 15/Jan/18 06:25

Updated:: 30/Sep/23 07:58

Resolved:: 08/Oct/19 05:43