Apache OpenOffice (AOO) Bugzilla – Issue 60110
import from csv file sometimes strips initial apostrophe in cell
Last modified: 2017-05-20 11:11:41 UTC
OpenOffice improperly imports a field in a CSV file that contains only a single quote. I am able to get it to import the field properly if the field contains two single quotes enclosed in a set of double quotes. The test file at the url above is an example file that shows this behavior. Excel doesn't exhibit this behavior. From what I read of RFC 4180, it looks like OpenOffice is not RFC compliant. That said, implementing CSV doesn't seem to straight forward either (there seem to be several interpretations of how a CSV file should be formatted). However, given that OpenOffice is an office suite and strives to be compatible with Excel, I think its behavior should be similar to Excel's.
Hi, I'm sorry, but I don't get the point with the file you've mentioned. Please be more precise there to find the problem. Also a smaller file would be great to get the point. Frank
Set needmoreinfo keyword. Perhaps this is related to the use of a single quote to denote a number which should be displayed as text? Or to other bugs surrounding the use of single quotes (see for example issue 65510)? Steve
BTW, the URL for the csv file is broken.
The specific problem is that OOo Calc sometimes strips the initial apostrophe in a cell after a CSV import, even if that apostrophe is enclosed in double quotes (normally denoting an exact text import). Interestingly, the apostrophe seems to be stripped in all cases except when followed immediately by a number. Even more interestly, the cell contents appear properly in the import preview. Once the import takes place, however, the error occurs and the apostrophe's get stripped. I will attach an example.
Created attachment 37545 [details] CSV file with examples of various possible cases of apostrophe imports
The contents of the CSV: WITHOUT QUOTES 1 apostrophe,' 2 apostrophes,'' 3 apostrophes,''' A number,'3 A word,'word A misspelled word,'mword WITH QUOTES 1 apostrophe,"'" 2 apostrophes,"''" 3 apostrophes,"'''" A number,"'3" A word,"'word" A misspelled word,"'mword" In all these cases except '3 and "'3" the apostrophe is stripped after the import to Excel. This behavior is very similar to behavior described in issue 65510; I suspect a dependency on 65510 and have marked accordingly. Also added ms_interoperability keyword because Excel treats CSV imports of apostrophes different: *In Calc, "''" is required to import a single apostrophe due to the stripping of the initial apostrophe *In Excel, only "'" is required to import a single apostrophe
> In all these cases except '3 and "'3" the apostrophe is stripped after the > import to Excel. Excuse the error; this line actually describes the behavior in CALC, not Excel. Thus it should read: "In all these cases except '3 and "'3" the apostrophe is stripped after the import to Calc." -SF
>> From a personal email from hsorenson, posted w/ permission: > BTW, the URL for the csv file is broken. I've put this back in place. My website changed and it wasn't on the new one. http://www.nosneros.net/hso/test.csv OpenOffice imports csv differently than excel. The RFC, unfortunatly, doesn't disambiguate in such a way to say which implementation is correct. OpenOffice needs "''" to import an apostrophe (\x27) from CSV. Excel needs "'" to import an apostrophe (\x27) from CSV. It would be nice if OpenOffice followed Excel's lead since Excel has a larger install base. I use both and having this inconsistency is annoying. -Holt
Hi eike, please have a look at this one. Frank
The CSV import currently interprets the field content the same way it does as if keyed in as input, with the exception of a single apostrophe as field content, thus forcing otherwise numerical context to textual content, discarding the leading apostrophe. This should be disabled for CSV import and field content taken as is. Btw, it is a common misconception that field content quoted by double quotes should always be textual content, this is _not_ the case. Double quotes are to be removed and then it is up to the application to interpret the content. Otherwise it would be impossible to have numerical values contain the field separator. This issue is related to 65510 for the leading apostrophe handling, but doesn't depend on it in the sense that issue 65510 would block this issue, removing dependency. Changing target to 2.x because of desired lossless data import.
change target from 2.x to 3.x according to http://wiki.services.openoffice.org/wiki/Target_3x
Reset assigne to the default "issues@openoffice.apache.org".