Details
Description
I ran the code from the attached SO issue and yes it doesn't detect semicolon separated files. The reason is this line in TextAndCSVParser.java:
private static final char[] DEFAULT_DELIMITERS = new char[]\{',', '\t'};
This is later used by CSVSniffer. For some reason the other delimiters (pipe, colon and semicolon) aren't in that array, although they are in CHAR_TO_STRING_DELIMITER_MAP. I modified DEFAULT_DELIMITERS and now it works for semicolon.
Can I change this by adding the missing delimiters or was there a reason that I missed? Proposed change would change CSVSniffer so that delimiters is a set and then pass CHAR_TO_STRING_DELIMITER_MAP.keySet().
Attachments
Attachments
Issue Links
- is duplicated by
-
TIKA-3155 Parse Error while extracting CSV files
- Closed
- links to