Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-955

Unable to extract "Track Changes" metadata from a microsoft word document

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.9
    • Fix Version/s: None
    • Component/s: metadata
    • Labels:
      None
    • Environment:

      OS: Windows 7

      Description

      A microsoft word document has feature to track and review the changes. How can tika jar help me identify such changes.

        Activity

        Hide
        tpalsulich Tyler Palsulich added a comment -

        Is there interest in implementing this? Does anyone know of a standard which shows where/how this data is stored? If not, I'll close this as Won't Fix.

        Show
        tpalsulich Tyler Palsulich added a comment - Is there interest in implementing this? Does anyone know of a standard which shows where/how this data is stored? If not, I'll close this as Won't Fix.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Let's leave this one open. It is on my list to get to at some point... (far down the list, admittedly)

        Show
        tallison@mitre.org Tim Allison added a comment - Let's leave this one open. It is on my list to get to at some point... (far down the list, admittedly)
        Hide
        kinow Bruno P. Kinoshita added a comment -

        >Is there interest in implementing this?

        Don't have any specific use case for it right now. But sounds like this could be useful for both someone with valid use cases, or for a quick analysis about the changes in the document.

        >Does anyone know of a standard which shows where/how this data is stored?

        The PROV ontology https://www.w3.org/TR/prov-overview/

        Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. This document provides an overview of this family of documents.

        There are libraries (https://github.com/lucmoreau/ProvToolbox) and even a tool in Apache incubator that utilises it (https://github.com/taverna/taverna-prov).

        Whenever I need to keep track of changes in entities in a system, I either use a simple audit table in some data storage system when it's simple enough, or adopt the provenance ontology.

        This could work for tracking changes in Microsoft Word documents.

        Show
        kinow Bruno P. Kinoshita added a comment - >Is there interest in implementing this? Don't have any specific use case for it right now. But sounds like this could be useful for both someone with valid use cases, or for a quick analysis about the changes in the document. >Does anyone know of a standard which shows where/how this data is stored? The PROV ontology https://www.w3.org/TR/prov-overview/ Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. This document provides an overview of this family of documents. There are libraries ( https://github.com/lucmoreau/ProvToolbox ) and even a tool in Apache incubator that utilises it ( https://github.com/taverna/taverna-prov ). Whenever I need to keep track of changes in entities in a system, I either use a simple audit table in some data storage system when it's simple enough, or adopt the provenance ontology. This could work for tracking changes in Microsoft Word documents.

          People

          • Assignee:
            Unassigned
            Reporter:
            priya.kujur@servient.com Priya Kujur
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development