Uploaded image for project: 'Comdev GSOC'
  1. Comdev GSOC
  2. GSOC-48

[GSoC] Test Document Generator/Permutator for Apache OpenOffice

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Won't Fix

    Description

      Apache OpenOffice is the leading open source desktop office suite. Our most recent release has had over 40 million downloads.

      The default document format for OpenOffice is Open Document Format (ODF). But we also can work with Microsoft document formats, including legacy binary formats (DOC/XLS/PPT) and their new XML formats (DOCX/XLSX/PPTX).

      A continuing challenge is finding an efficient way to test our support of these document formats. It is extremely laborious to create test documents Imagine, for example, we want to verify that we can correctly process table cell formatting. We have variations in text styles, in border styles, in fills, in alignment, etc. A complete test would require an large number of manually created test cases.

      Is it possible to do better than this? Can test documents be automatically generated?

      Presumably, yes, they can be automatically generated. We have open source libraries, in Java, that can read and write ODF and Microsoft documents:

      The Apache ODF Toolkit for ODF documents: http://incubator.apache.org/odftoolkit/

      Apache POI for Microsoft documents: http://poi.apache.org/

      But can this be made really easy, so QA tester, not a programmer, can generate test cases easily? Can we find a way to specify a test scenario and then generate a range of test documents in all three formats?

      Can we be smart about this and generate complete X*Y*Z sets of test cases as well as fractional factorial design (http://en.wikipedia.org/wiki/Fractional_factorial_design)? For example, the factors for a text style might be: typeface, font size, weight, color, background color and alignment. A test of all combinations would lead to an enormous number of test cases, because of the huge number of colors and typefaces. But to be useful, we only need a subset of these test cases, the ones that are likely to reveal bugs. How can we be intelligent about this?

      The specification for the document formats is available as well. So we have a formal description of the schema for ODF and OpenXML. Is that information useful? Can we have "schema-directed test document creation"?

      As you can see, there is a broad range of things that could be done here, limited only by time, skill and interests of the student. One could easily develop new ideas and research here that could be publishable. The results would be useful to Apache OpenOffice of course, but could potentially be applicable more broadly, to other products and other markup languages.

      Skills needed:

      – Java programming ("Core Java"), good working knowledge, but don't need to be a guru or anything

      – Knowledge of XML

      – Helps to have some awareness of QA, e.g., what "test coverage" is and why it is important.

      For more information and to discuss your GSoC proposal further, you can write to the Apache OpenOffice project's development mailing list: http://openoffice.apache.org/mailing-lists.html#development-mailing-list-public

      Attachments

        Activity

          People

            Unassigned Unassigned
            robweir Rob Weir
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: