Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: cli, metadata
    • Labels:

      Description

      It would be great if the Tika CLI could output metadata also in the XMP format.

      1. tika-xmp_styleAndHeader.patch
        127 kB
        Jörg Ehrlich
      2. tika-xmp.patch
        88 kB
        Jörg Ehrlich

        Issue Links

          Activity

          Hide
          Jukka Zitting added a comment -

          Thanks! I applied the style and header patch in revision 1357199.

          Show
          Jukka Zitting added a comment - Thanks! I applied the style and header patch in revision 1357199.
          Hide
          Jörg Ehrlich added a comment -

          Added patch with style changes and adjusted licence header. No functional changes.

          Show
          Jörg Ehrlich added a comment - Added patch with style changes and adjusted licence header. No functional changes.
          Hide
          Jörg Ehrlich added a comment -

          Thanks Jukka,

          The IPTC header I copied accidentally and the Converter idea is good.
          I will provide a patch with the style/header changes first and then work on another one for the Converter idea.

          Show
          Jörg Ehrlich added a comment - Thanks Jukka, The IPTC header I copied accidentally and the Converter idea is good. I will provide a patch with the style/header changes first and then work on another one for the Converter idea.
          Hide
          Jukka Zitting added a comment -

          OK, see my followup commits for the minor adjustments. There are a few things remaining:

          • I didn't change the formatting of code inside tika-xmp/src to avoid breaking any pending changes you may have. It would be good however to unify the formatting of that code with the rest of Tika where we normally use four spaces (no tabs) for indentation and don't use a separate line for an opening brace.
          • The copyright headers mention the IPTC Photo Metadata standard. I didn't notice specific IPTC metadata descriptions being included in the relevant source files, so can we drop that extra copyright notice?
          • The tika-xmp component currently has a dependency on tika-parsers just to get the list of media types supported by relevant parser components. Could we rather make tika-parsers depend on tika-xmp and provide the Converter classes as parts of the relevant o.a.t.parser.* packages. The TikaToXMP class could access them using the same ServiceLoader mechanism as tika-core uses for Detector and Parser implementations?
          Show
          Jukka Zitting added a comment - OK, see my followup commits for the minor adjustments. There are a few things remaining: I didn't change the formatting of code inside tika-xmp/src to avoid breaking any pending changes you may have. It would be good however to unify the formatting of that code with the rest of Tika where we normally use four spaces (no tabs) for indentation and don't use a separate line for an opening brace. The copyright headers mention the IPTC Photo Metadata standard. I didn't notice specific IPTC metadata descriptions being included in the relevant source files, so can we drop that extra copyright notice? The tika-xmp component currently has a dependency on tika-parsers just to get the list of media types supported by relevant parser components. Could we rather make tika-parsers depend on tika-xmp and provide the Converter classes as parts of the relevant o.a.t.parser.* packages. The TikaToXMP class could access them using the same ServiceLoader mechanism as tika-core uses for Detector and Parser implementations?
          Hide
          Jukka Zitting added a comment -

          Nice work! I committed the latest patch in revision 1356202.

          There's a few minor issues like the use of tabs instead of spaces for indentation and required updates to our licensing details. I can take care of those in a minute.

          Show
          Jukka Zitting added a comment - Nice work! I committed the latest patch in revision 1356202. There's a few minor issues like the use of tabs instead of spaces for indentation and required updates to our licensing details. I can take care of those in a minute.
          Hide
          Jörg Ehrlich added a comment -

          As TIKA-929 has already been resolved I have deleted the previous two patches and upload a new one now, which also contains adjustments to latests tika-cli changes.

          Show
          Jörg Ehrlich added a comment - As TIKA-929 has already been resolved I have deleted the previous two patches and upload a new one now, which also contains adjustments to latests tika-cli changes.
          Hide
          Jörg Ehrlich added a comment -

          uploading an update to the patch which depends on TIKA929 which fixes a test

          Show
          Jörg Ehrlich added a comment - uploading an update to the patch which depends on TIKA929 which fixes a test
          Hide
          Jörg Ehrlich added a comment -

          Version 5.1.1 of the XMPCore library which is compatible with JDK 1.5/1.6 is available on Maven Central now.

          Show
          Jörg Ehrlich added a comment - Version 5.1.1 of the XMPCore library which is compatible with JDK 1.5/1.6 is available on Maven Central now.
          Hide
          Jörg Ehrlich added a comment -

          The tika-xmp module provided by the patches use the XMPCore library available in the Maven Central repository. Unfortunately the current verson 5.1.0 has been compiled for JDK 1.7 which is not compatible with Tika. We are in the process of uploading an update to 5.1.1 which will solve that problem. The Patch can only be applied when the new XMPCore version 5.1.1 is available.

          Show
          Jörg Ehrlich added a comment - The tika-xmp module provided by the patches use the XMPCore library available in the Maven Central repository. Unfortunately the current verson 5.1.0 has been compiled for JDK 1.7 which is not compatible with Tika. We are in the process of uploading an update to 5.1.1 which will solve that problem. The Patch can only be applied when the new XMPCore version 5.1.1 is available.
          Hide
          Jörg Ehrlich added a comment -

          As stated in the comment, it is recommended to use the patch which depends on TIKA-929 to resolve this issue.

          Show
          Jörg Ehrlich added a comment - As stated in the comment, it is recommended to use the patch which depends on TIKA-929 to resolve this issue.
          Hide
          Jörg Ehrlich added a comment -

          The tika-xmp_dependsOn_TIKA929changes.patch contains the same tika-xmp module as offered by the other patch, but depends on the patch from TIKA-929 being applied first.

          The recommendation is to use this one instead of tika-xmp.patch

          Show
          Jörg Ehrlich added a comment - The tika-xmp_dependsOn_TIKA929changes.patch contains the same tika-xmp module as offered by the other patch, but depends on the patch from TIKA-929 being applied first. The recommendation is to use this one instead of tika-xmp.patch
          Hide
          Jörg Ehrlich added a comment -

          The tika-xmp.patch provides an extra Tika module which offers conversion of Tika Metadata to XMP data model. It also integrates it with the "-y" output option of Tika-app, and therefor providing XMP output for Tika CLI.

          The API extends the tika-core Metadata class but also offers the possibility to directly work with the XMP data model.
          The Metadata information from Tika can either be converted by mimetype-specific converters which convert everything for their respective file format or by a generic converter, which will only convert full qualified properties which use prefixes from registered namespaces.

          Show
          Jörg Ehrlich added a comment - The tika-xmp.patch provides an extra Tika module which offers conversion of Tika Metadata to XMP data model. It also integrates it with the "-y" output option of Tika-app, and therefor providing XMP output for Tika CLI. The API extends the tika-core Metadata class but also offers the possibility to directly work with the XMP data model. The Metadata information from Tika can either be converted by mimetype-specific converters which convert everything for their respective file format or by a generic converter, which will only convert full qualified properties which use prefixes from registered namespaces.
          Hide
          Jukka Zitting added a comment -

          Rough first version committed in revision 1185805.

          Show
          Jukka Zitting added a comment - Rough first version committed in revision 1185805.

            People

            • Assignee:
              Jörg Ehrlich
              Reporter:
              Jukka Zitting
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development