Tika
  1. Tika
  2. TIKA-696

Extract watermarks from Word documents

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.9
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None

      Description

      It would be nice to store the text of a watermark as metadata.

      1. Demo with watermark.doc
        20 kB
        Julien Nioche
      2. Demo+with+watermark.docx
        24 kB
        Julien Nioche

        Activity

        Hide
        Julien Nioche added a comment -

        Attached doc file containing a watermark

        Show
        Julien Nioche added a comment - Attached doc file containing a watermark
        Hide
        Julien Nioche added a comment -

        Can't see the watermark when saving and reopening the doc at the .docx format. Have used OpenOffice for generating it.

        Show
        Julien Nioche added a comment - Can't see the watermark when saving and reopening the doc at the .docx format. Have used OpenOffice for generating it.
        Hide
        Julien Nioche added a comment -

        .docx version generated with MS Office

        Can't see the watermark with OO but a reliable informer has told me that it is visible when loading with MS Office.

        Show
        Julien Nioche added a comment - .docx version generated with MS Office Can't see the watermark with OO but a reliable informer has told me that it is visible when loading with MS Office.
        Hide
        Julien Nioche added a comment -

        The text of the watermark can be found towards the end of word/header1.xml from the .docx

        <v:textpath style="font-family:&quot;Calibri&quot;" fitpath="t" string="DRAFT CONTRACT"/>
        
        Show
        Julien Nioche added a comment - The text of the watermark can be found towards the end of word/header1.xml from the .docx <v:textpath style= "font-family:&quot;Calibri&quot;" fitpath= "t" string= "DRAFT CONTRACT" />
        Hide
        Nick Burch added a comment -

        That should be fairly easy to add for .docx, for .doc may take a little more work

        Show
        Nick Burch added a comment - That should be fairly easy to add for .docx, for .doc may take a little more work

          People

          • Assignee:
            Unassigned
            Reporter:
            Julien Nioche
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development