Issue 123928 - Hyperlinks not output in CSV output format
Summary: Hyperlinks not output in CSV output format
Status: CLOSED WONT_FIX
Alias: None
Product: Calc
Classification: Application
Component: save-export (show other issues)
Version: 4.0.1
Hardware: PC All
: P3 Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-29 04:05 UTC by Alan
Modified: 2023-01-25 17:26 UTC (History)
4 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: 4.1.0-dev
Developer Difficulty: ---


Attachments
Image of the Calc spreadsheet content (149.98 KB, image/jpeg)
2013-12-30 05:54 UTC, Alan
no flags Details
Calc Spreadsheet with two rows of data in two columns (265.26 KB, application/zip)
2013-12-30 20:13 UTC, Alan
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Alan 2013-12-29 04:05:20 UTC
I have a spreadsheet that contains hyperlinks. I want to output the data in the spreadsheet (including the hyperlinks) in CSV format. Calc will output the hyperlinks when saving as an HTML file, XML file or as an ODS file but not when saving as a CSV file. Since it apparently saves the hyperlinks in all other output formats but only fails to saves hyperlinks in CSV format; this seems to be a bug.
Comment 1 Rainer Bielefeld 2013-12-29 08:53:29 UTC
No, it's a characteristic [1][2] of csv, what's something like Plain Text [3]

[1] <http://en.wikipedia.org/wiki/Comma-separated_values>
[2] <http://tools.ietf.org/html/rfc4180>
[3] <http://en.wikipedia.org/wiki/Plain_text>
Comment 2 Alan 2013-12-29 18:26:47 UTC
(In reply to Rainer Bielefeld from comment #1)
> No, it's a characteristic [1][2] of csv, what's something like Plain Text [3]
> 
> [1] <http://en.wikipedia.org/wiki/Comma-separated_values>
> [2] <http://tools.ietf.org/html/rfc4180>
> [3] <http://en.wikipedia.org/wiki/Plain_text>

I am very conversant with data formats. So while the references you provided were unnecessary, it is nice to know that you were willing to go to the effort to help someone out.

I was obviously not clear enough in my description of the issue (my fault. I made an assumption that it would be understood that I knew what CSV was...). 

I was not expecting the hyperlink to be output as a hyperlink in the csv file. I was expecting the VALUE of the hyperlink to be output as text. The value of the hyperlink is part of the data within the spreadsheet. When outputting the data from the spreadsheet, I would expect that the hyperlink value would also be output.

If the value of the hyperlink is not output, then I have to maintain DUPLICATE fields in the spreadsheet. One with the hyperlink and a second with the text value of the hyperlink. While I could name the hyperlink with the same value as the hyperlink to avoid duplication; this would make the cell content nearly worthless -- since many URLs provide no clue as to the actual content of the page.
Comment 3 Rainer Bielefeld 2013-12-29 18:47:30 UTC
@Alan:
Please simply attach a test kit showing your problem.
Comment 4 Alan 2013-12-30 05:54:16 UTC
Created attachment 82188 [details]
Image of the Calc spreadsheet content

I could not attach the spreadsheet because the Bugzilla "Browse..." function won't let me. However, the attached JPEG image of the spreadsheet content will hopefully be sufficient to make this clear. The image is of a spreadsheet with two rows. The window over the spreadsheet is the hyperlink menu which shows the value of the hyperlink in the first row, first cell.

I want to output the two rows (of two columns) to CSV format (comma separated plain text). The first column has a hyperlink and the second column is just text. The first column has TWO values for each cell. One is the value displayed in the column and the second (not displayed) is the hyperlink content.

When using "Save As" and outputting to (apparently) anything but CSV, the hyperlink value is output. For CSV it is not output. The hyperlink value is part of the content of the cell and should be output (in my opinion...) since it is data in the spreadsheet and without it the cell content is useless. 

The Calc output of the spreadsheet when using Save As to CSV is:
   open office,some text
   firefox,some other text
and this is useless. The most important content (the hyperlink value) has not been output.

From my perspective, the output to CSV should be:
   open office,http://www.openoffice.org/,some text
   firefox,http://www.mozilla.org/en-US/firefox/new/,some other text

or 
   http://www.openoffice.org/,open office,some text
   http://www.mozilla.org/en-US/firefox/new/,firefox,some other text
Comment 5 Rainer Bielefeld 2013-12-30 08:52:51 UTC
I still haven't a clue what reporter's problem might be. I think he simply has wrong expectations.

> I could not attach the spreadsheet because the Bugzilla "Browse..." function
> won't let me. 

@Alan:
So you do something wrong. But you can send to me sourcedocument.odt and result.csv by email (simply click my name in a comment), I will attach the documents for you.

And your expectation seems wrong. As I told you in Comment 1, 'Save as .csv' only will export the plain text contents, not Hyperlinks, tooltips, Colors, other "hidden" contents, whatever else. 

"I expect" is nothing what will cause any activity here. My general hint is that you should read our bug writing guidelines
<http://qa.openoffice.org/issue_handling/pre_submission.html> and  
<http://wiki.openoffice.org/wiki/QA/HowToFileIssue#Principles>, 
unfortunately important information we will need to reproduce your problem is missing.
Please add all information requested in following:
a) Write a meaningful Summary describing exactly what the problem is
b) Attach a sample document (not only screenshot) or refer to an existing 
  sample document in an other Bug with a link; to attach a file to this 
  bug report, just click on "Add an attachment" right on this page.
  If you want to attach a test kit with multiple documents zip them into
  a single testkit.zip and attach the  testkit.zip
c) Contribute a document related step by step instruction containing every 
  key press and every mouse click how to reproduce your problem 
d) Attach screenshots with comments if you believe that that might explain the 
  problem better than a text comment. Best way is to insert your screenshots
  into a DRAW document and to add comments that explain what you want to show
e) if possible contribute an instruction how to create a sample document 
  from the scratch
f) add information 
  f1) what EXACTLY is unexpected
  f2) and WHY do you believe it's unexpected (cite Help or Documentation!)
  f6) AOO settings that might be related to your problems 
     (video hardware acceleration, ...)
  f9) Whether that worked in former OOO / AOO Versions 
  f10) if a competitor's software (MSO, LibreOffice, ...) fulfills your needs
Comment 6 Alan 2013-12-30 20:13:46 UTC
Created attachment 82192 [details]
Calc Spreadsheet with two rows of data in two columns

I have attached a zip file with six items. The readme.txt file contains basically what is in this comment. bugzillaError.jpg shows the error I get when attempting to attach an .ods file using the "Browse..." function of the Bugzilla WEB page. The other five files demonstrate the issue.

The spreadsheet testSheet.ods has one row and two columns. The first column contains text (visible) and a hyperlink (hidden). The second column contains text.

If I output as XML (using Save As) the hyperlink value is output to the XML file (testSheet.xml) as shown by this content in the XML file:
    <Cell ss:HRef="http://www.openoffice.org/"><Data ss:Type="String">open office</Data></Cell>

If I output as HTML (using Save As) the hyperlink value is output to the HTML file (testSheet.html) as shown by this content in the HTML file:
    <TD WIDTH=84 HEIGHT=18 ALIGN=LEFT><A HREF="http://www.openoffice.org/">open office</A></TD>

If I output as CSV (using Save As) the hyperlink value is NOT output to the CSV file (testSheet.csv) as shown by this content in the CSV file:
    open office,some text

Hidden values are being output to XML and HTML as part of the cell contents. So saying that hidden values are not output is incorrect. They are output. My issue is that they are not output when the output format is CSV.
Comment 7 Rainer Bielefeld 2014-01-01 16:49:11 UTC
@Alan
Your expectation is wrong, does not match with [1], [2], [3], other users' expectations.
Comment 8 Alan 2014-01-02 00:10:45 UTC
Ref [1] does not specify that hypertext link values are not output. 

Ref [2] RFC4180 does NOT specify what fields are or are not output in CSV. It specifies the formatting of the lines and that the content is plain text. 

Ref [3] states: "Files that contain markup or other meta-data are generally considered plain-text"


So meta-data is considered (by the references that you provided) to be plain text. The hyperlink value is meta-data for the cell. You output the meta-data to other formats (XML & HTML) -- you even output cell comments!!

The only explicitly stated exclusion of output from a spreadsheet (in any of the provided references) is FORMULAs. The hypertext link value is not a formula, it is TEXT. By refs [1], [2] and [3], the hyperlink value should be output. 

As to stating that users do not expect the hyperlink value to be output, that is opinion, not fact. There has been at least one other written complaint about Calc not outputting the hyperlink value in CSV format. I think you will find that most people who need the hyperlink value work around the bug by using either HTML or XML output.

Sorry but failure to output a TEXT value that is meta-data for a spreadsheet cell is a bug. The hyperlink is text, it is a value of the cell, it is plain text, it is meta-data, and it is inconsistent that Calc outputs meta-data for HTML and XML but not CSV.
Comment 9 Edwin Sharp 2014-01-02 09:45:20 UTC
For sure not a defect.
I couldn't output both text and link to csv in Excel 2010, Calligra Sheets 2.6.4 and Gnumeric 1.10.16.

AOO410m1(Build:9750)  -  Rev. 1552994
Rev.1552994
Win 7
Comment 10 Alan 2014-01-02 18:17:06 UTC
So, since other spreadsheets handle exporting of meta-data to CSV incorrectly, OO Calc should too?

Is this like Windows placing the Shut-Down function on the Start menu? Once someone does it wrong everyone should think it is the right way to do it in the name of maintaining "compatibility"?

Consider this topic closed. I will do what everyone else does and work around the ?feature?/?defect?/?bug?. I will output as XML, extract the fields, reformat it to CSV, and then input the CSV to my other processes.

Thank you for taking the time to respond!! It is appreciated, even though I am not happy with the outcome.
Comment 11 damjan 2023-01-23 04:28:28 UTC
The problem is badly defined. A hyperlink exist within a cell's text, and multiple hyperlinks can exist in this text. Eg. a cell's text could be "_Click_here_for_website1_  or _click_here_for_website2_", and each of those could link to a different website, all within one cell.

How should that be converted to CSV? By adding extra columns after cells with hyperlinks, with each hyperlink in own column? By changing the text to add a space and hyperlink with brackets within the field? There is just no standard, and there are different ways of doing it, and once exported it wouldn't be imported into the original form, so users that need it would be served best by their own custom solution, such a macro that does custom conversion of cells with hyperlinks before export.

Should this be RESOLVED WONT_FIX? Or what solution do you propose?
Comment 12 Alan 2023-01-23 06:10:30 UTC
The "Save As..." dialog should have a check box to output hyperlink values. When checked, the hyperlink value should be output as a separate field immediately following the cell's text value. For example: ..., amazon site, https://www.amazon.com, ...

The default should be NOT to output hyperlinks so that those moving data from one spreadsheet to another will not have unexpected results. Only those who need the hyperlinks will choose to output them by checking that option.

I fully understand that if the hyperlink value is read back into a spreadsheet it will appear in a separate cell and will not be a hyperlink. This is NOT an issue.

The purpose for output of the hyperlink's value is for processing external to the spreadsheet.

Note that both HTML and XML output output the hyperlink. It is inconsistent that the hyperlink is output for these formats but not for CSV.

A question that arises is: Should the dialog option to save the hyperlinks also apply to HTML and XML output? Or do you leave that as is?
Comment 13 damjan 2023-01-24 19:12:29 UTC
This is something custom and specific to your workflow, probably best solved with a macro.

I've made a macro that you can use, please see:
https://forum.openoffice.org/en/forum/viewtopic.php?t=109327
and please continue the discussion there.
Comment 14 Alan 2023-01-25 00:34:06 UTC
The workaround Basic macro, provided by damjan@apache.org in Comment 13, will accomplish the output of the hyperlink. While this still leaves OO CSV output different from the XML and HTML output, those who need the hyperlinks in CSV format can output them via this Basic macro.

This is resolved.
Comment 15 damjan 2023-01-25 06:22:06 UTC
The resolution isn't FIXED, it should be FIXED_WITHOUT_CODE or WONTFIX.

Thank you for your bug report.
Comment 16 Marcus 2023-01-25 17:26:15 UTC
Closing this issue as we will not fix it directly in Apache OpenOffice.

If you need a fix please refer to the macro in comment #13.