Apache OpenOffice (AOO) Bugzilla – Issue 122822
Correct viewing of XY-, Column- and Line-Charts limited to 10000 records + 1 Heading row
Last modified: 2017-05-20 10:33:41 UTC
Large xy charts are not displayed correctly with AOO4.0.0, whereas it is OK in AOO 3.4.1. File is created with AOO3.4.1 See attachments datarange = $A$3:$A$49100 (works on AOO3.4.1) Reducing datarange to $A$3:$A$19100 still doesnt work with AOO4.0.0 Reducing datarange to $A$3:$A$9100 works with AOO4.0.0 Very inconvenient if files from earlier AOO versions are not fully supported with newer AOO versions...
Created attachment 81132 [details] AOO 4.0.0 -> bad
Created attachment 81133 [details] AOO 3.4.1 -> good
Tested with Win7-64 and WinXP-32 bit...
Please attach the file itself. Please remove all parts, which are not needed to reproduce the bug.
@rohner@tofwerk.com: And of the remaining part still is confidential (or if you do not have available 3.4.x to edit the document) you can send it to Regina or to me by email.
Created attachment 81134 [details] some data removed to be below 1000 KB, still the same behaviour
not confidential I hope - wow fast response!
It is correct till range A3:B10001 and broken for A3:B10002 and higher. I have enlarged the chart to 200cmx50cm so that the line segments are not too small but visible. But the error still occurs.
I see the error already in r1432130 (~January).
I can confirm Reginas results, there is some magic after 10000 data records. It's indeed a viewing problem, Charts in Document saved from AOO4 will look fine when reopened in 3.4
Created attachment 81136 [details] Simpe Sample with Line Chart Line Charts also are affected. Steps how to reproduce with attached sample: 1. Open sample2.ods from AOO Start Center 2. Double click Chart -> Rightclick -> Dataranges 3. Change "$Tabelle1.$A$1:$B$10001" to "$Tabelle1.$A$1:$B$10001" <ok> > Sinus reappears Additional info: ---------------- a) Also Column Charts are affected - may be all Charts? b) I replace the old sample documents by a more simple one, see comment above how to use
Arg, in step 3 (comment above) please replace by "$Tabelle1.$A$1:$B$10000", of course!
I have narrowed it to: OK in r1384746 (around 2012-09-17) broken in r1388589 (around 2012-09-21)
Thanks for narrowing it down! Looking through the commits I saw that in revision 1388440 change to main/sc/source/ui/unoobj/chart2uno.cxx there is an explicit check for the magical 10000 number.
Created attachment 81142 [details] patch for fixing up revision 1388440 This patch suggestion fixes the the changed code from revision 13388440 introduced by bug 121058 to - work with double values instead of integers only - properly initializes the min..max range for each step
Created attachment 81144 [details] updated patch to work better with non-numeric cells
Your (first) solution identifies the errors correctly. I have applied the patch and the curve is drawn not so bad now. But I'm not happy with the solution for issue 121058 at all. For continuous source, using min and max results in two data points being near and then having a large gap. This happens because max of one part is near min of the next part. This way you do not get an even distribution of the selected points. Using random points might be better. Using regular distances might hit regular data. For a point cloud using min and max might result in a totally wrong image. In addition the data labels are wrong. You do not get the original data labels, but the continuous first part of the data label column is mapped to the new created data point sequence. The patch for issue 121058 was introduced, because an Excel-file opened too slow. But ods-files open fairly fast. Of cause you have to wait a few minutes, but we consider files with over 10000 data points. The problem is in recalculating the chart. So a solution for issue 121058 might be to do not recalculate the chart on opening, but on change or on demand.
Created attachment 81145 [details] Bad rendered cloud The attached file shows a randomly filled cloud. The points should more or less cover a rectangle. You can reset the range to a under 10000 data points to see how it should look. You can see the wrong data labels, if you activate the chart and hover with the mouse over a data point. The point informations are shown as tooltip.
With the spreadsheet component having been extended to handle more than a million rows the idea of issue 121058 to reduce the number of displayed items has its merits. Your random cloud example shows that the current behavior has some serious problems though: 1. the current reduction is by a constant factor of 50 when 10000 entries are involved 2. only the minimum and maximum values are considered interesting, other values are completely ignored 3. the point information displayed in the tooltip does no longer match Regarding 1. it should be reduced gracefully by limiting the number of displayed item to e.g. 2500 and calculating the reduction factor from there. Maybe the that reduction target number needs to be adjustable per chart. Regarding 2. the minimum and maximum numbers are most interesting but other values must not be ignored. Choosing a random item other than the min or max items is a good idea. Regarding 3. when reduction is active the tooltip annotation needs to disabled
*** Issue 122907 has been marked as a duplicate of this issue. ***
adding keyword regression
Regarding 2: I disagree - in x-y plots the min and max values are not special and therefore not of any interest if one is not plotting the whole data-set (except maybe for calculating the automatic y-scale which probably should be the same for the full and reduced-data-set) see attachment "origin-plot-large-data.png" how (this old version) handels large datasets... (In reply to hdu@apache.org from comment #19) > With the spreadsheet component having been extended to handle more than a > million rows the idea of issue 121058 to reduce the number of displayed > items has its merits. Your random cloud example shows that the current > behavior has some serious problems though: > 1. the current reduction is by a constant factor of 50 when 10000 entries > are involved > 2. only the minimum and maximum values are considered interesting, other > values are completely ignored > 3. the point information displayed in the tooltip does no longer match > > Regarding 1. it should be reduced gracefully by limiting the number of > displayed item to e.g. 2500 and calculating the reduction factor from there. > Maybe the that reduction target number needs to be adjustable per chart. > > Regarding 2. the minimum and maximum numbers are most interesting but other > values must not be ignored. Choosing a random item other than the min or max > items is a good idea. > > Regarding 3. when reduction is active the tooltip annotation needs to > disabled
Created attachment 81328 [details] how other programs handle large datasets -> skip points
approve showstopper request
Created attachment 81359 [details] Fix patch The root cause of this defect is in fix of 121058, only some points were picked up in order to save loading time and memory for tens of thousands of data points. All data points were divided into many small groups and only picked min and max points from one group. The reason why only picked min and max points instead of using a regular distance is, for example, if there is a column or a line chart, most of data points are around max value 100 and min value 10, only if we pick max and min values from one group, the chart can keep it's original outline. But if we use a regular distance, we might only get some mid-value(for example 50) of the chart and will miss the chart outline. But the mechanism never consider scatter chart or bubble chart which have multiple data sequence in one series. For example, for scatter chart, one series has two sequence, x values and y values. A x value and a y value must be a pair. Then when pick x values, a min and a max values were picked. When pick y values, another min and max values were picked. However, the picked min and max x values probably are not in one pair of min and max y values. For example, index of min and max x values are 10 and 40. It should get No. 10 and No. 40 values from y values, but it might get No. 30 and No. 35. So in this case, the chart data are totally corrupt. That is the root cause. My fix will roll back code of 121058 in ScChart2DataSequence::getNumericalData and add new fix code in ScChart2DataSequence::BuildDataCache(), then wrong axes data label issue mentioned by Regina Henschel can be fixed. And new code will use regular distance to pick up points then problem for scatter chart and other charts which have multiple sequences in one series can be fixed. If the distance is small enough, fore-mentioned chart outline issue is no problem. So I use 5 as regular distance.
Comment on attachment 81359 [details] Fix patch I find a problem of this patch, so obsolete it. I'll upload a new patch soon.
Created attachment 81367 [details] Fix patch The earlier fix patch has a problem. It cannot fix the problem of hover tip which is mentioned by Regina Henschel. It will always get drawing shape's name for hover tip. But drawing shape's name comes from the index when rendering these shapes. So if pick data points before rendering, the index will be always from 1 to the number it picked out. In order to solve this problem, picking must at view side, say rendering. As each chart type has it's own rendering, my fix is only in AreaChart.cxx. Area, line, XY and Radar chart share one implementation in this file. I tested sample file of 121058, it's a line chart, convert it to area, XY, radar charts, there are all performance problem. So my fix is only for these chart types. If there are performance problem for other chart types who have huge data, then it should be fixed separatedly.
I have done some further tests. I have remove the changes from bug 121058 and have made no reducing of points, so that the relevant code is the same as in OOo3.4.1, the rest is version r1512966. The file https://issues.apache.org/ooo/attachment.cgi?id=79599 from bug 121058 opens as good as it opens in OOo3.4.1 in about 90 seconds, but the reaction on a single click increases from about 60 seconds in OOo3.4.1 to 3 minutes. The file attached here opens in 15 seconds and reacts immediately on single click in both cases. The file http://people.apache.org/~regina/Bug_122822_and_121058_XYChart_SinCosTan.ods opens in 90 seconds and reacts very very slow on single click in both cases. The file http://people.apache.org/~regina/Bug_122822_and_121058_CloudLarge_60000Values.ods crashes in both cases after about 45 seconds, in r1512966 with "bad allocation" error message. I suggest, that the changes from bug 121058 are reverted for AOO4.0.1 and nothing else. There is no real regression compared to OOo3.4.1. For AOO4.1 the reason for the bad performance and for the crash should be found and fixed. And if that is not possible, a reduction of points should be done, but with a UI so that the user can set the threshold. My debug build crashes already with fewer data points. All test done on Windows 7.
Created attachment 81400 [details] Patch for only reverting 121058 This new patch is only reverting code of 121058
"hdu" committed SVN revision 1518091 into trunk: #i122822# revert fix for issue 121058
"hdu" committed SVN revision 1518092 into branches/AOO401: #i122822# revert fix for issue 121058
Reverting the original change in trunk and AOO401. For AOO 4.x there'll be a followup task for reducing the sample count on the view side.
The latest build in wiki is AOO401m1(Build:9710) - Rev. 1516414, wait for revision 1518092 for verification
It's verified fixed in build AOO401m4(Build:9713) - Rev. 1521921 with attached sample " Simpe Sample with Line Chart"
*** Issue 123284 has been marked as a duplicate of this issue. ***