Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21562

Add more informative message on CSV parsing errors

    XMLWordPrintableJSON

Details

    Description

      I was parsing a CSV file with comments in it and used 'csv.allow-comments' = 'true' without also passing 'csv.ignore-parse-errors' = 'true' to the table DDL to not hide any actual format errors.
      Since I didn't just have strings in my table, this did of course stumble on the commented-out line with the following error:

      2021-02-16 17:45:53,055 WARN  org.apache.flink.runtime.taskmanager.Task                    [] - Source: TableSourceScan(table=[[default_catalog, default_database, airports]], fields=[IATA_CODE, AIRPORT, CITY, STATE, COUNTRY, LATITUDE, LONGITUDE]) -> SinkConversionToTuple2 -> Sink: SQL Client Stream Collect Sink (1/1)#0 (9f3a3965f18ed99ee42580bdb559ba66) switched from RUNNING to FAILED.
      java.io.IOException: Failed to deserialize CSV row.
      	at org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:257) ~[flink-csv-1.12.1.jar:1.12.1]
      	at org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:162) ~[flink-csv-1.12.1.jar:1.12.1]
      	at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
      	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
      	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
      	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
      Caused by: java.lang.NumberFormatException: empty String
      	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) ~[?:1.8.0_275]
      	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) ~[?:1.8.0_275]
      	at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_275]
      	at org.apache.flink.formats.csv.CsvToRowDataConverters.convertToDouble(CsvToRowDataConverters.java:203) ~[flink-csv-1.12.1.jar:1.12.1]
      	at org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createNullableConverter$ac6e531e$1(CsvToRowDataConverters.java:113) ~[flink-csv-1.12.1.jar:1.12.1]
      	at org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createRowConverter$18bb1dd$1(CsvToRowDataConverters.java:98) ~[flink-csv-1.12.1.jar:1.12.1]
      	at org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:251) ~[flink-csv-1.12.1.jar:1.12.1]
      	... 5 more
      

      Two things should be improved here:

      1. commented-out lines should be ignored by default (potentially, FLINK-17133 addresses this or at least gives the user the power to do so)
      2. the error message itself is not very informative: "empty String".

      This ticket is about the latter. I would suggest to have at least a few more pointers to direct the user to finding the source in the CSV file/item/... - here, the data type could just be wrong or the CSV file itself may be wrong/corrupted and the user would need to investigate.
      What exactly may help here, probably depends on the actual input connector this format is currently working with, e.g. line number in a csv file would be best, otherwise that may not be possible but we could show the whole line or at least a few surrounding fields...

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nkruber Nico Kruber
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: