122822 – Correct viewing of XY-, Column- and Line-Charts limited to 10000 records + 1 Heading row

Issue 122822 - Correct viewing of XY-, Column- and Line-Charts limited to 10000 records + 1 Heading row

Summary: Correct viewing of XY-, Column- and Line-Charts limited to 10000 records + 1 ...

Status:	CLOSED FIXED

Alias:	None

Product:	Calc
Classification:	Application
Component:	viewing (show other issues)
Version:	4.0.0
Hardware:	PC Windows 7

Importance:	P3 Major (vote)
Target Milestone:	4.0.1
Assignee:	AOO issues mailing list
QA Contact:	Susi

URL:
Keywords:	regression

Duplicates (2):	122907 123284 (view as issue list)
Depends on:	121058
Blocks:
	Show dependency tree

Reported:	2013-07-24 14:01 UTC by rohner
Modified:	2017-05-20 10:33 UTC (History)
CC List:	11 users (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Flags:	jsc: 4.0.1_release_blocker+

Attachments
AOO 4.0.0 -> bad (166.03 KB, image/png) 2013-07-24 14:02 UTC, rohner	no flags	Details
AOO 3.4.1 -> good (177.49 KB, image/png) 2013-07-24 14:03 UTC, rohner	no flags	Details
some data removed to be below 1000 KB, still the same behaviour (859.92 KB, application/vnd.oasis.opendocument.spreadsheet) 2013-07-24 14:14 UTC, rohner	no flags	Details
Simpe Sample with Line Chart (859.92 KB, application/x-vnd.oasis.opendocument.spreadsheet) 2013-07-24 16:57 UTC, Rainer Bielefeld	no flags	Details
patch for fixing up revision 1388440 (1.48 KB, patch) 2013-07-25 15:33 UTC, hdu@apache.org	no flags	Details \| Diff
updated patch to work better with non-numeric cells (1.75 KB, patch) 2013-07-25 17:05 UTC, hdu@apache.org	no flags	Details \| Diff
Bad rendered cloud (270.95 KB, application/vnd.oasis.opendocument.spreadsheet) 2013-07-25 17:33 UTC, Regina Henschel	no flags	Details
how other programs handle large datasets -> skip points (37.64 KB, image/png) 2013-08-14 08:54 UTC, rohner	no flags	Details
Fix patch (3.95 KB, patch) 2013-08-20 09:06 UTC, Clarence GUO	no flags	Details \| Diff
Fix patch (4.14 KB, patch) 2013-08-21 08:41 UTC, Clarence GUO	clarence.guo.bj: review?	Details \| Diff
Patch for only reverting 121058 (2.71 KB, patch) 2013-08-27 23:16 UTC, Clarence GUO	clarence.guo.bj: review?	Details \| Diff
Show Obsolete (3) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description rohner 2013-07-24 14:01:52 UTC

Large xy charts are not displayed correctly with AOO4.0.0, whereas it is OK in AOO 3.4.1. File is created with AOO3.4.1

See attachments datarange = $A$3:$A$49100 (works on AOO3.4.1)

Reducing datarange to $A$3:$A$19100 still doesnt work with AOO4.0.0
Reducing datarange to $A$3:$A$9100  works with AOO4.0.0

Very inconvenient if files from earlier AOO versions are not fully supported with newer AOO versions...

Comment 1 rohner 2013-07-24 14:02:40 UTC

Created attachment 81132 [details]
AOO 4.0.0 -> bad

Comment 2 rohner 2013-07-24 14:03:22 UTC

Created attachment 81133 [details]
AOO 3.4.1 -> good

Comment 3 rohner 2013-07-24 14:04:43 UTC

Tested with Win7-64 and WinXP-32 bit...

Comment 4 Regina Henschel 2013-07-24 14:07:53 UTC

Please attach the file itself. Please remove all parts, which are not needed to reproduce the bug.

Comment 5 Rainer Bielefeld 2013-07-24 14:11:44 UTC

@rohner@tofwerk.com:
And of the remaining part still is confidential (or if you do not have available 3.4.x to edit the document) you can send it to Regina or to me by email.

Comment 6 rohner 2013-07-24 14:14:05 UTC

Created attachment 81134 [details]
some data removed to be below 1000 KB, still the same behaviour

Comment 7 rohner 2013-07-24 14:21:32 UTC

not confidential I hope - wow fast response!

Comment 8 Regina Henschel 2013-07-24 15:58:55 UTC

It is correct till range A3:B10001 and broken for A3:B10002 and higher.

I have enlarged the chart to 200cmx50cm so that the line segments are not too small but visible. But the error still occurs.

Comment 9 Regina Henschel 2013-07-24 16:11:36 UTC

I see the error already in r1432130 (~January).

Comment 10 Rainer Bielefeld 2013-07-24 16:45:17 UTC

I can confirm Reginas results, there is some magic after 10000 data records.

It's indeed a viewing problem, Charts in Document saved from AOO4 will look fine when reopened in 3.4

Comment 11 Rainer Bielefeld 2013-07-24 16:57:36 UTC

Created attachment 81136 [details]
Simpe Sample with Line Chart

Line Charts also are affected. Steps how to reproduce with attached sample:
1. Open sample2.ods from AOO Start Center
2. Double click Chart -> Rightclick -> Dataranges
3. Change "$Tabelle1.$A$1:$B$10001" to "$Tabelle1.$A$1:$B$10001" <ok>
   > Sinus reappears

Additional info:
----------------
a) Also Column Charts are affected - may be all Charts?
b) I replace the old sample documents by a more simple one, see comment above
   how to use

Comment 12 Rainer Bielefeld 2013-07-24 16:58:59 UTC

Arg, in step 3 (comment above) please replace by "$Tabelle1.$A$1:$B$10000", of course!

Comment 13 Regina Henschel 2013-07-25 11:10:33 UTC

I have narrowed it to:
OK in r1384746  (around 2012-09-17)
broken in r1388589 (around 2012-09-21)

Comment 14 hdu@apache.org 2013-07-25 13:56:54 UTC

Thanks for narrowing it down! Looking through the commits I saw that in revision 1388440 change to main/sc/source/ui/unoobj/chart2uno.cxx there is an explicit check for the magical 10000 number.

Comment 15 hdu@apache.org 2013-07-25 15:33:14 UTC

Created attachment 81142 [details]
patch for fixing up revision 1388440

This patch suggestion fixes the the changed code from revision 13388440 introduced by bug 121058 to
- work with double values instead of integers only
- properly initializes the min..max range for each step

Comment 16 hdu@apache.org 2013-07-25 17:05:46 UTC

Created attachment 81144 [details]
updated patch to work better with non-numeric cells

Comment 17 Regina Henschel 2013-07-25 17:30:16 UTC

Your (first) solution identifies the errors correctly. I have applied the patch and the curve is drawn not so bad now.

But I'm not happy with the solution for issue 121058 at all.

For continuous source, using min and max results in two data points being near and then having a large gap. This happens because max of one part is near min of the next part. This way you do not get an even distribution of the selected points. Using random points might be better. Using regular distances might hit regular data.

For a point cloud using min and max might result in a totally wrong image.

In addition the data labels are wrong. You do not get the original data labels, but the continuous first part of the data label column is mapped to the new created data point sequence.

The patch for issue 121058 was introduced, because an Excel-file opened too slow. But ods-files open fairly fast. Of cause you have to wait a few minutes, but we consider files with over 10000 data points. The problem is in recalculating the chart. So a solution for issue 121058 might be to do not recalculate the chart on opening, but on change or on demand.

Comment 18 Regina Henschel 2013-07-25 17:33:55 UTC

Created attachment 81145 [details]
Bad rendered cloud

The attached file shows a randomly filled cloud. The points should more or less cover a rectangle. You can reset the range to a under 10000 data points to see how it should look.

You can see the wrong data labels, if you activate the chart and hover with the mouse over a data point. The point informations are shown as tooltip.

Comment 19 hdu@apache.org 2013-07-26 08:40:44 UTC

With the spreadsheet component having been extended to handle more than a million rows the idea of issue 121058 to reduce the number of displayed items has its merits. Your random cloud example shows that the current behavior has some serious problems though:
1. the current reduction is by a constant factor of 50 when 10000 entries are involved
2. only the minimum and maximum values are considered interesting, other values are completely ignored
3. the point information displayed in the tooltip does no longer match

Regarding 1. it should be reduced gracefully by limiting the number of displayed item to e.g. 2500 and calculating the reduction factor from there. Maybe the that reduction target number needs to be adjustable per chart.

Regarding 2. the minimum and maximum numbers are most interesting but other values must not be ignored. Choosing a random item other than the min or max items is a good idea.

Regarding 3. when reduction is active the tooltip annotation needs to disabled

Comment 20 Regina Henschel 2013-07-31 10:39:48 UTC

*** Issue 122907 has been marked as a duplicate of this issue. ***

Comment 21 Oliver-Rainer Wittmann 2013-08-09 12:54:52 UTC

adding keyword regression

Comment 22 rohner 2013-08-14 08:53:10 UTC

Regarding 2: I disagree - in x-y plots the min and max values are not special and therefore not of any interest if one is not plotting the whole data-set (except maybe for calculating the automatic y-scale which probably should be the same for the full and reduced-data-set) 

see attachment "origin-plot-large-data.png" how (this old version) handels large datasets...


(In reply to hdu@apache.org from comment #19)
> With the spreadsheet component having been extended to handle more than a
> million rows the idea of issue 121058 to reduce the number of displayed
> items has its merits. Your random cloud example shows that the current
> behavior has some serious problems though:
> 1. the current reduction is by a constant factor of 50 when 10000 entries
> are involved
> 2. only the minimum and maximum values are considered interesting, other
> values are completely ignored
> 3. the point information displayed in the tooltip does no longer match
> 
> Regarding 1. it should be reduced gracefully by limiting the number of
> displayed item to e.g. 2500 and calculating the reduction factor from there.
> Maybe the that reduction target number needs to be adjustable per chart.
> 
> Regarding 2. the minimum and maximum numbers are most interesting but other
> values must not be ignored. Choosing a random item other than the min or max
> items is a good idea.
> 
> Regarding 3. when reduction is active the tooltip annotation needs to
> disabled

Comment 23 rohner 2013-08-14 08:54:05 UTC

Created attachment 81328 [details]
how other programs handle large datasets -> skip points

Comment 24 jsc 2013-08-15 09:22:26 UTC

approve showstopper request

Comment 25 Clarence GUO 2013-08-20 09:06:15 UTC

Created attachment 81359 [details]
Fix patch

The root cause of this defect is in fix of 121058, only some points were picked up in order to save loading time and memory for tens of thousands of data points. All data points were divided into many small groups and only picked min and max points from one group. The reason why only picked min and max points instead of using a regular distance is, for example, if there is a column or a line chart, most of data points are around max value 100 and min value 10, only if we pick max and min values from one group, the chart can keep it's original outline. But if we use a regular distance, we might only get some mid-value(for example 50) of the chart and will miss the chart outline.
But the mechanism never consider scatter chart or bubble chart which have multiple data sequence in one series. For example, for scatter chart, one series has two sequence, x values and y values. A x value and a y value must be a pair. Then when pick x values, a min and a max values were picked. When pick y values, another min and max values were picked. However, the picked min and max x values probably are not in one pair of min and max y values. For example, index of min and max x values are 10 and 40. It should get No. 10 and No. 40 values from y values, but it might get No. 30 and No. 35. So in this case, the chart data are totally corrupt. That is the root cause.
My fix will roll back code of 121058 in ScChart2DataSequence::getNumericalData and add new fix code in ScChart2DataSequence::BuildDataCache(), then wrong axes data label issue mentioned by Regina Henschel can be fixed. And new code will use regular distance to pick up points then problem for scatter chart and other charts which have multiple sequences in one series can be fixed. If the distance is small enough, fore-mentioned chart outline issue is no problem. So I use 5 as regular distance.

Comment 26 Clarence GUO 2013-08-21 08:05:00 UTC

Comment on attachment 81359 [details]
Fix patch

I find a problem of this patch, so obsolete it.
I'll upload a new patch soon.

Comment 27 Clarence GUO 2013-08-21 08:41:30 UTC

Created attachment 81367 [details]
Fix patch

The earlier fix patch has a problem. It cannot fix the problem of hover tip which is mentioned by Regina Henschel.
It will always get drawing shape's name for hover tip. But drawing shape's name comes from the index when rendering these shapes. So if pick data points before rendering, the index will be always from 1 to the number it picked out.
In order to solve this problem, picking must at view side, say rendering.
As each chart type has it's own rendering, my fix is only in AreaChart.cxx. Area, line, XY and Radar chart share one implementation in this file. I tested sample file of 121058, it's a line chart, convert it to area, XY, radar charts, there are all performance problem. So my fix is only for these chart types. If there are performance problem for other chart types who have huge data, then it should be fixed separatedly.

Comment 28 Regina Henschel 2013-08-25 17:14:18 UTC

I have done some further tests. I have remove the changes from bug 121058 and have made no reducing of points, so that the relevant code is the same as in OOo3.4.1, the rest is version r1512966.

The file https://issues.apache.org/ooo/attachment.cgi?id=79599 from bug 121058 opens as good as it opens in OOo3.4.1 in about 90 seconds, but the reaction on a single click increases from about 60 seconds in OOo3.4.1 to 3 minutes.

The file attached here opens in 15 seconds and reacts immediately on single click in both cases.

The file http://people.apache.org/~regina/Bug_122822_and_121058_XYChart_SinCosTan.ods opens in 90 seconds and reacts very very slow on single click in both cases.

The file http://people.apache.org/~regina/Bug_122822_and_121058_CloudLarge_60000Values.ods crashes in both cases after about 45 seconds, in r1512966 with "bad allocation" error message.

I suggest, that the changes from bug 121058 are reverted for AOO4.0.1 and nothing else. There is no real regression compared to OOo3.4.1.

For AOO4.1 the reason for the bad performance and for the crash should be found and fixed. And if that is not possible, a reduction of points should be done, but with a UI so that the user can set the threshold.

My debug build crashes already with fewer data points.

All test done on Windows 7.

Comment 29 Clarence GUO 2013-08-27 23:16:37 UTC

Created attachment 81400 [details]
Patch for only reverting 121058

This new patch is only reverting code of 121058

Comment 30 SVN Robot 2013-08-28 07:08:51 UTC

"hdu" committed SVN revision 1518091 into trunk:
#i122822# revert fix for issue 121058

Comment 31 SVN Robot 2013-08-28 07:10:19 UTC

"hdu" committed SVN revision 1518092 into branches/AOO401:
#i122822# revert fix for issue 121058

Comment 32 hdu@apache.org 2013-08-28 07:19:54 UTC

Reverting the original change in trunk and AOO401. For AOO 4.x there'll be a followup task for reducing the sample count on the view side.

Comment 33 fanyuzhen 2013-08-30 14:07:46 UTC

The latest build in wiki is AOO401m1(Build:9710)  -  Rev. 1516414, wait for revision 1518092 for verification

Comment 34 fanyuzhen 2013-09-17 11:13:15 UTC

It's verified fixed in build AOO401m4(Build:9713)  -  Rev. 1521921 with attached sample " Simpe Sample with Line Chart"

Comment 35 Rainer Bielefeld 2013-09-19 04:29:43 UTC

*** Issue 123284 has been marked as a duplicate of this issue. ***