Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
PostgresSQL by default creates CSV files which uses absent value for database NULL values. Consider the following test in PostgresSQL:
CREATE TABLE mytab ( col1 varchar(10), col2 int, col3 varchar(10) ); INSERT INTO mytab VALUES ('AA', 33, null); INSERT INTO mytab VALUES ('AA', null, ''); INSERT INTO mytab VALUES (null, 33, 'CC');
and then exporting it to CSV:
\copy mytab TO STDOUT WITH csv
will produce CSV output as follows:
AA,33, AA,,"" ,33,CC
which the library currently will parse as follows using the current POSTGRESQL_CSV format:
"AA","33",null "AA",null,null null,"33","CC"
Row no 2 is incorrect when comparing to the actual database table contents.
The fix
Therefore the declaration of POSTGRESQL_CSV must be changed to:
public static final CSVFormat POSTGRESQL_CSV = DEFAULT .withDelimiter(COMMA) .withEscape(DOUBLE_QUOTE_CHAR) .withIgnoreEmptyLines(false) .withQuote(DOUBLE_QUOTE_CHAR) .withRecordSeparator(LF) .withAbsentMeansNull(true) // added .withNullString(EMPTY) .withQuoteMode(QuoteMode.ALL_NON_NULL);
(this depends on PR51)
With the above the parser will give the following result instead:
"AA","33",null "AA",null,"" null,"33","CC"
which is the expected result.
Attachments
Issue Links
- is related to
-
CSV-253 Handle absent values in input (null)
- Resolved