Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2186

Rework CSV import to support very wide files

    XMLWordPrintableJSON

Details

    Description

      In the current readVcsFile implementation, importing CSV files with many columns can become from cumbersome to impossible.

      For example to import an 11 column file we need to write:

      val cancer = env.readCsvFile[(String, String, String, String, String, String, String, String, String, String, String)]("/path/to/breast-cancer-wisconsin.data")
      

      For many use cases in Machine Learning we might have CSV files with thousands or millions of columns that we want to import as vectors.
      In that case using the current readCsvFile method becomes impossible.

      We therefore need to rework the current function, or create a new one that will allow us to import CSV files with an arbitrary number of columns.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tvas Theodore Vasiloudis
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m