Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2988

Cannot load DataSet[Row] from CSV file

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 0.10.0
    • None
    • None

    Description

      Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot load a CSV file with more than 25 columns directly as a DataSet[TupleX[...]].

      An alternative to using Tuples is using the Table API's Row class, which allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using RowTypeInfo) and index-based access.

      However, trying to load a CSV file as a DataSet[Row] yields an exception:

      val env = ExecutionEnvironment.createLocalEnvironment()
      val filePath = "../someCsv.csv"
      val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number"))
      val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo)
      println(source.collect())
      

      with someCsv.csv containing:

      one,1
      two,2
      

      yields

      Exception in thread "main" java.lang.ClassCastException: org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase
      	at org.apache.flink.api.scala.operators.ScalaCsvInputFormat.<init>(ScalaCsvInputFormat.java:46)
      	at org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282)
      

      As a user I would like to be able to load a CSV file into a DataSet[Row], preferably having a convenience method to specify the schema (RowTypeInfo), without having to use the "explicit implicit parameters" syntax and specifying the ClassTag.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jkovacs Johann Kovacs
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: