Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25393

Parsing CSV strings in a column

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 3.0.0
    • SQL
    • None

    Description

      There are use cases when content in CSV format is dumped into an external storage as one of columns. For example, CSV records are stored together with other meta-info to Kafka. Current Spark API doesn't allow to parse such columns directly. The existing method csv() requires a dataset with one string column. The API is inconvenient in parsing CSV column in dataset with many columns. The ticket aims to add new function similar to from_json() with the following signatures in Scala:

      def from_csv(e: Column, schema: StructType, options: Map[String, String]): Column
      

      and for using from Python, R and Java:

      def from_csv(e: Column, schema: String, options: java.util.Map[String, String]): Column
      

      Attachments

        Activity

          People

            maxgekk Max Gekk
            maxgekk Max Gekk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: