Issue 82952 - CSV import is wrong on CR (0x13) value
Summary: CSV import is wrong on CR (0x13) value
Status: CONFIRMED
Alias: None
Product: Calc
Classification: Application
Component: open-import (show other issues)
Version: OOo 2.2 RC4
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
: 81470 83768 95958 98274 106740 (view as issue list)
Depends on:
Blocks:
 
Reported: 2007-10-24 22:53 UTC by fcartegnie
Modified: 2013-08-07 15:13 UTC (History)
5 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
sample dataset (31 bytes, text/csv)
2007-11-20 07:29 UTC, fcartegnie
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description fcartegnie 2007-10-24 22:53:05 UTC
CSV import is wrong in importing lines.
If it found a single 0x13 value, it guesses it's the newline separator.

The right behaviour would be to trigger a newline only on 0x10, and on 0x10 0x13
sequence. 0x13 alone never meant newline on any system.
Comment 1 helenrussian 2007-11-18 16:53:51 UTC
Please attach sample file
Comment 2 fcartegnie 2007-11-20 07:29:00 UTC
Created attachment 49762 [details]
sample dataset
Comment 3 bormant 2007-11-21 12:52:32 UTC
1) "0x13 alone never meant newline on any system." How about Commodore 
machines, Apple II family and Mac OS up to version 9? 
http://en.wikipedia.org/wiki/Line_feed
2) 0x10 or 0x13 are never used as newline. CR=13(dec)=0D(hex), LF=10(dec)=0A
(hex). Win/dos newline style is CR LF pair, *nix style is single LF -- both are 
mentioned in RFC 4180 (http://tools.ietf.org/html/rfc4180) as valid line 
separators.
3) Single CR is not mentioned in RFC 4180 as valid line separator. I don't know 
how many "single CR separator" files come from anywhere.
Comment 4 fcartegnie 2007-11-21 13:15:41 UTC
forgot the previous. I thought as it was just newlines in a text file.

Anyway, the mentioned RFC tells that line is finished by a CRLF record, which is
"CR LF" (ABNF sequence http://tools.ietf.org/html/rfc2234#section-3.1), not "CR
/ LF" (ABNF alternative). Section 6.1 of RFC 2234 also does.

So it should break only on the 2 bytes value CRLF, no ?
Comment 5 bormant 2007-11-21 14:06:15 UTC
Yes, RFC 4180 tells about CR LF sequence, not CR/LF alternative.
However, CSV file is a *text* file above all and community uses historical 
LF/CRLF/CR newlines in text files. So, usually we don't know system, received 
file come from.
May be we need option (enhancement), that control import of single-cr -- use it 
as newline char (as now) or use it as line-break (as Ctrl+Enter in Calc and 
Shift+Enter in Writer).
Comment 6 peter.junge 2009-07-22 06:40:17 UTC
Confirmed. Plus, some duplicates exist.
Comment 7 peter.junge 2009-07-22 06:41:22 UTC
*** Issue 81470 has been marked as a duplicate of this issue. ***
Comment 8 peter.junge 2009-07-22 06:43:40 UTC
*** Issue 83768 has been marked as a duplicate of this issue. ***
Comment 9 peter.junge 2009-07-22 06:45:38 UTC
*** Issue 83768 has been marked as a duplicate of this issue. ***
Comment 10 peter.junge 2009-07-22 06:46:54 UTC
*** Issue 98274 has been marked as a duplicate of this issue. ***
Comment 11 peter.junge 2009-07-22 06:48:06 UTC
*** Issue 95958 has been marked as a duplicate of this issue. ***
Comment 12 peter.junge 2009-07-22 07:05:30 UTC
This is an enhancement, not a defect. New record delimiter LF/CRLF (not CR)needs
either be selectable in CSV dialog for both import and export, as RFC 4180 seems
to allow both.
http://tools.ietf.org/html/rfc4180
Comment 13 peter.junge 2009-07-22 07:37:54 UTC
Sorry, phrasing failed completely in previous comment, should be:

This is an enhancement, not a defect. The 'New Record' delimiter LF/CRLF (not
CR!) needs to be selectable in dialogs for both CSV import and export, as RFC
4180 seems to allow both.
Comment 14 Regina Henschel 2009-11-09 18:29:26 UTC
*** Issue 106740 has been marked as a duplicate of this issue. ***