Uploaded image for project: 'Hivemall'
  1. Hivemall
  2. HIVEMALL-61

Support a function to convert a comma-separated string into typed data and vice versa

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Labels:
      None

      Description

      Currently, spark does not have this features (IMO this feature will not appear as first-class ones in Spark) it is useful for ETL before ML processing.
      e.x.)

      scala> val ds1 = Seq("""1,abc""").toDS()
      ds1: org.apache.spark.sql.Dataset[String] = [value: string]
      
      scala> val schema = new StructType().add("a", IntegerType).add("b", StringType)
      schema: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true), StructField(b,StringType,true))
      
      scala> val ds2 = ds1.select(from_csv($"value", schema))
      ds2: org.apache.spark.sql.DataFrame = [csvtostruct(value): struct<a: int, b: string>]
      
      scala> ds2.printSchema
      root
       |-- csvtostruct(value): struct (nullable = true)
       |    |-- a: integer (nullable = true)
       |    |-- b: string (nullable = true)
      
      
      scala> ds2.show
      +------------------+
      |csvtostruct(value)|
      +------------------+
      |           [1,abc]|
      +------------------+
      

      A related discussion is here: https://github.com/apache/spark/pull/13300#issuecomment-261962773

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                maropu Takeshi Yamamuro
                Reporter:
                maropu Takeshi Yamamuro
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: