[FLINK-10684] Improve the CSV reading process - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Not a Priority
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: API / DataSet
Labels:

Description

CSV is one of the most commonly used file formats in data wrangling. To load records from CSV files, Flink has provided the basic CsvInputFormat, as well as some variants (e.g., RowCsvInputFormat and PojoCsvInputFormat). However, it seems that the reading process can be improved. For example, we could add a built-in util to automatically infer schemas from CSV headers and samples of data. Also, the current bad record handling method can be improved by somehow keeping the invalid lines (and even the reasons for failed parsing), instead of logging the total number only.

This is an umbrella issue for all the improvements and bug fixes for the CSV reading process.

Attachments

Issue Links

is cloned by

FLINK-29689 NIFI Performance issue - ConvertExcelToCSVProcessor to handle more data

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Xingcan Cui

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 26/Oct/18 04:20

Updated:: 19/Oct/22 12:14