[CSV-254] POSTGRESQL_CSV cannot parse correctly (null vs zero-length string) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

PostgresSQL by default creates CSV files which uses absent value for database NULL values. Consider the following test in PostgresSQL:

CREATE TABLE mytab (
	col1   varchar(10),
	col2   int,
	col3   varchar(10)
);
INSERT INTO mytab VALUES ('AA', 33, null);
INSERT INTO mytab VALUES ('AA', null, '');
INSERT INTO mytab VALUES (null, 33, 'CC');

and then exporting it to CSV:

\copy mytab TO STDOUT WITH csv

will produce CSV output as follows:

AA,33,
AA,,""
,33,CC

which the library currently will parse as follows using the current POSTGRESQL_CSV format:

"AA","33",null
"AA",null,null
null,"33","CC"

Row no 2 is incorrect when comparing to the actual database table contents.

The fix

Therefore the declaration of POSTGRESQL_CSV must be changed to:

public static final CSVFormat POSTGRESQL_CSV = DEFAULT
            .withDelimiter(COMMA)
            .withEscape(DOUBLE_QUOTE_CHAR)
            .withIgnoreEmptyLines(false)
            .withQuote(DOUBLE_QUOTE_CHAR)
            .withRecordSeparator(LF)
            .withAbsentMeansNull(true)   // added
            .withNullString(EMPTY)
            .withQuoteMode(QuoteMode.ALL_NON_NULL);

(this depends on PR51)

With the above the parser will give the following result instead:

"AA","33",null
"AA",null,""
null,"33","CC"

which is the expected result.

Attachments

Issue Links

is related to

CSV-253 Handle absent values in input (null)

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Lars Bruun-Hansen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Oct/19 19:25

Updated:: 15/Oct/22 19:58