Apache OpenOffice (AOO) Bugzilla – Issue 107693
Calc crashes when referencing entire external sheet
Last modified: 2013-08-07 15:13:10 UTC
Open a new Calc sheet and save somewhere. Open a second new Calc sheet, and in A1 enter "=SUM(" then select an entire sheet of the first document. Hit Enter. Calc exits with a bad_alloc after allocating >1.7GB of heap memory. This is because all empty cells are stored in the external reference cache.
My goodness. Did someone try to select an entire sheet (again) !? ;-) I'll look into it.
I did this to try out my fix for issue 107439 -- where the Excel export filter writes out the same external value again and again for every cell formula that uses this value. I added an optimization to leave out empty cells (which is what Excel does too). This helps to prevent huge files for formulas with big external references to empty ranges.
Gotcha. Either way this needs to be fixed.
I'm working on this in kohei04 cws right now.
Actually, this task is getting large enough that I'd like to use a separate CWS for this. Taking this off of kohei04's task list.
To get this one right, I need to rework ScMatrix class itself, to add a new type of matrix that's sparsely populated i.e. most of its elements are empty. I'm working on that right now.
I think that, by moving the current value-optimized matrix implementation code into its own impl class, and making it swappable with another backend implementation, we can probably support both the existing number-optimized matrices alongside the new sparsely populated ones, with minimal disruption in the existing code that uses ScMatrix class.
Really? More changes to basic classes? How useful is this case anyway, shouldn't we just set an error if ScMatrix::GetElementsMax is exceeded?
What I need is to build a legitimate matrix that only has a few cells filled. We can't throw an error at those matrices since they are valid matrices. This is done for performance reasons, which I assume is useful.
BTW, other matrix implementations also support multiple backends depending on the nature of stored data. boost ublas is one such implementation. So, it's not unreasonable to have multiple backends for the matrix class. It's actually a necessity if you care about performance of matrix-related computations.
And then we're back to adjusting sumif or dsum again? External references could become an endless story, and I want to avoid that.
Ok. You clearly don't understand what I'm trying to do. I'll only modify how the matrix elements are stored. This won't alter the behavior of the matrix class, hence no adjustment is necessary in the client code. Period. BTW, what do you mean by the endless story? Do you want me to back off of this so that you can fix all this mess on your own?
I would like to see a fix that doesn't cause new problems. A small fix is much more likely to achieve this. If we want to change the matrix implementation, we can do it in a separate step, with enough testing, without the haste of fixing this regression.
>A small fix is much more likely to achieve this. Likely, yes, but not always. There are times when you need to make a non-trivial change, to fix the underlying design mistake or do things the right way without resorting to many small hacks. This is unfortunately one such case. >If we want to change the matrix implementation, we can do it in a separate step, with enough testing, without the haste of fixing this regression. Fair enough. But without the change in the matrix implementation I can only fix this issue partially. So, I will leave this one open and file another issue, to incorporate the partial fix that only involves change in the external ref manager.
Re-targeting.
Reset assignee on issues not touched by assignee in more than 1000 days.