Apache OpenOffice (AOO) Bugzilla – Issue 80139
Avoid ERRORS in duplicate Spreadsheets and Data
Last modified: 2017-05-20 09:55:41 UTC
This issue makes it in my TOP 5 of major design flaws of existing spreadsheet applications. TOC === 1. INTRO / REAL CASES 2. PROBLEM 3. SOLUTIONS 3.1 COLLABORATIONS 3.2 DUPLICATE DATA / DATA LINKS 3.3 TRACK CHANGES 3.4 VERSIONING 4. OTHER RELEVANT ISSUES 1. INTRO ======== *Duplicate Data* perpetuates ERRORS! One encounters in current practice dozens of copies of the same spreadsheet because different people need to work with it, or because one needs only a subset of the original data (which is copied/saved into a new table/ spreadsheet). [This is NOT a duplicate of issue 8811.] One of my first tasks - as I started work in a big department - was to oversee the allocation of revenues to various contractors based on some specific characteristics. The first issue I stumbled upon, was the *lack of a central repository* where those characteristics were stored. Instead, various departments/employees had their own copy of the spreadsheet and everybody was updating his own copy. BUT no one had the spreadsheet up to date. There was NO way to get a fully *up to date* spreadsheet, and there was NO way to *track the changes*. As a second example, I will describe now a scenario that happened while I was doing some research. I was the only person involved in analysing the spreadsheet. However, due to the complexity of the spreadsheet, I created various (sub-) tables containing only relevant portions of the data. In the case I detected one error in the data, there was NO method to correct the error *only ONCE*, and have the correction applied in all sheets/data. The errors were doomed to persist in the various duplicate data. 2. THE PROBLEM ============== Existing spreadsheets offer very little mechanisms to avoid problems of duplicate data and the errors that arise because of this. While many will argue that in such cases a database should be used, common practice shows that most work is still done with spreadsheets, because of the following reasons: a.) easy to create (actually NO effort at all) b.) needs NOT a rigid structure (unlike a DB), therefore great flexibility in the beginning, when the data may be largely unknown c.) you see what you do, and you already see the results (unlike for DBs - you need a dedicated software to compute the results) d.) NO special coding skills needed 3. SOLUTIONS ============ As mentioned, a DB might be tentative, BUT NOT in real practice. Some spreadsheets offer some (incomplete) solutions (like Excel for issue 8811), BUT they will all fail on a global scale. 3.1 COLLABORATIONS ================== Spreadsheets should allow collaboration between people. This is described in greater detail in issue 8811. Features to be implemented: - allow simultaneous editing by different users - allow various locking mechanisms 3.2 DATA LINKS ============== Often, one needs only subsets of the original data to further process. One may filter this data out and then copy/paste it to a new spreadsheet. However, what is lacking is to paste NOT the actual data, but a link to the original data, like a *HARD-LINK* on Unix-like OSs. This is described in greater detail in issue 66817 (see http://www.openoffice.org/issues/show_bug.cgi?id=66817). A *HARD-LINK* would implement a mechanism through which a correction done in a copied cell is propagated back in the original cell. Another issue in this series is issue 34213 that involves pasting references to the original cells (see http://www.openoffice.org/issues/show_bug.cgi?id=34213). [Though the hard-links would be more powerful.] 3.3 TRACK CHANGES ================= A very useful feature for such collaborative efforts is to track the changes. This is even more important, as numerous simultaneous changes may slow down the computer due to the re-calculations. Therefore, a mechanism should be in place to disable automatic recalculations BUT to show which cells have potentially changed (and need be updated IF one needs those values). Some more brainstorming is here really indicated. 3.4 VERSIONING ============== Another problem of these spreadsheets arises directly due to the frequent changes. Previous spreadsheets might well be needed (because e.g. of legal reasons), so one has to save the various versions of the spreadsheet. I like in this respect especially the versioning scheme described in FORTRESS (see the Fortress specification, http://research.sun.com/projects/plrg/Publications/index.html), where new versions of the same program are saved alongside the old version. This Versioning-mechanism should be able to store snapshots of the spreadsheet for future use. [This is definitely more difficult than the case in FORTRESS, so some more brainstorming is warranted.] 4. RELEVANT ISSUES ================== I have already mentioned the relevant issues. There are surely more relevant issues (to come). In brief: - issue 8811: Allow multiple users to edit the same spreadsheet... - issue 66817: Paste as HARD-LINK (Unix style) - issue 34213: Paste as reference - still others to come ...
3.3 TRACK CHANGES ================= I have filed issue 80325 which deals with more advanced features in the track-changes category (see also issues blocking that one). Please note, this is not the simple track-changes that is lacking. The problem with a simple track changes is: 1.) nobody can track all the changes in a big-spreadsheet (consider a 100x1,000 = 100,000 cell spreadsheet, one needs years to review all changes in a multi-user environment) 2.) one wants to detect functional areas - track only some specific cells/ranges 3.) currently, IF a formula-calculated results changes, it is marked as changed (NOT the changing of the formula itself, BUT the computed value) 4.) when one detects a suspicious value, one wants to track the changes in upstream cells that resulted in this suspicious value 5.) a better history of changes - organized depending on the functional area
Sorry for a slight ERROR in my previous post: > 3.) currently, IF a formula-calculated results changes, > it is >>marked<< as changed (NOT the changing of the formula itself, > BUT the computed value) CORRECT ======= 3.) currently, IF a formula-calculated result changes, it is *NOT marked* as changed (NOT IF changing the formula itself, BUT when only the computed value changes)
Another way to avoid duplicate data is to reference external data using the existing mechanisms: a.) directly referencing data in a different spreadsheet [though hard-links would be more powerful] b.) using a DDE-Link [I will discuss this in a new post] Unfortunately, both methods show substantial weaknesses. As described in issue 3740: Named ranges in one spreadsheet cannot be accessed from another spreadsheet. (see http://www.openoffice.org/issues/show_bug.cgi?id=3740) Why are *named ranges* such important? A.) CORRECT RANGE ================= Well, IF the user wrote the master-spreadsheet he may be able to recall in which rows/columns he did store the relevant data. BUT how well will he remember these design details in 2 weeks? And in 2 month? And what are the chances, that a different user will reference the correct range? B.) CHANGING RANGE ================== Another even worse problem arises when the initial data range changes: say, one has to add another data row, so *ALL END USERS* will have to correct their spreadsheets that link to this master-spreadsheet. This is a formidable task and one that generates numerous *hard-to-trace errors*. These are the main reasons, why it is desirable to link to a named range!
An interesting project dealing with more advanced 'Track Changes' and 'Collaborative Editing' for spreadsheets (specifically for Calc) can be found on sourceforge.net: http://telltable-s.sourceforge.net/ see also the following presentation and article on this topic: http://www.site.uottawa.ca/~adler/talks/2003/SSScan-eusprig-jul2003.pdf http://www.site.uottawa.ca/~adler/publications/2004/adler-nash-noel-2004-challenges-in-collaborative.pdf
reassigning features and enhancements to user requirements@openoffice.org which will be the default owner for those tasks (was introduced some time ago)
Please attach real life example.
No info from author.