[SPARK-25393] Parsing CSV strings in a column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

There are use cases when content in CSV format is dumped into an external storage as one of columns. For example, CSV records are stored together with other meta-info to Kafka. Current Spark API doesn't allow to parse such columns directly. The existing method csv() requires a dataset with one string column. The API is inconvenient in parsing CSV column in dataset with many columns. The ticket aims to add new function similar to from_json() with the following signatures in Scala:

def from_csv(e: Column, schema: StructType, options: Map[String, String]): Column

and for using from Python, R and Java:

def from_csv(e: Column, schema: String, options: java.util.Map[String, String]): Column

Attachments

Issue Links

links to

[Github] Pull Request #22379 (MaxGekk)

Activity

People

Assignee:: Max Gekk

Reporter:: Max Gekk

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Sep/18 09:04

Updated:: 12/Dec/22 18:10

Resolved:: 17/Oct/18 01:32