Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25393

Parsing CSV strings in a column

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      There are use cases when content in CSV format is dumped into an external storage as one of columns. For example, CSV records are stored together with other meta-info to Kafka. Current Spark API doesn't allow to parse such columns directly. The existing method csv() requires a dataset with one string column. The API is inconvenient in parsing CSV column in dataset with many columns. The ticket aims to add new function similar to from_json() with the following signatures in Scala:

      def from_csv(e: Column, schema: StructType, options: Map[String, String]): Column
      

      and for using from Python, R and Java:

      def from_csv(e: Column, schema: String, options: java.util.Map[String, String]): Column
      

        Attachments

          Activity

            People

            • Assignee:
              maxgekk Maxim Gekk
              Reporter:
              maxgekk Maxim Gekk
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: