Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9
    • Component/s: parser
    • Labels:
      None

      Description

      EXIFTool is a great Perl tool to extract metadata from tons of different media formats, in particular video, audio and images:

      http://www.sno.phy.queensu.ca/~phil/exiftool/

      Now that ExternalParser works, it's fairly easy to support this.

      EXIFTool can be installed on Mac with:

      $ brew install exiftool
      

      On CentOS Linux, you can do:

      $ sudo yum install perl-Image-ExifTool
      

        Activity

        Hide
        chrismattmann Chris A. Mattmann added a comment -
        • support added in r1681677.
        Show
        chrismattmann Chris A. Mattmann added a comment - support added in r1681677.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in tika-trunk-jdk1.7 #710 (See https://builds.apache.org/job/tika-trunk-jdk1.7/710/)
        CHANGES update for TIKA-1639. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1681679)

        • /tika/trunk/CHANGES.txt
        • /tika/trunk/tika-parsers/src/main/resources/org/apache/tika/parser/external/tika-external-parsers.xml
        Show
        hudson Hudson added a comment - FAILURE: Integrated in tika-trunk-jdk1.7 #710 (See https://builds.apache.org/job/tika-trunk-jdk1.7/710/ ) CHANGES update for TIKA-1639 . (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1681679 ) /tika/trunk/CHANGES.txt fix for TIKA-1639 : Add EXIFTool as an ExternalParser (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1681677 ) /tika/trunk/tika-parsers/src/main/resources/org/apache/tika/parser/external/tika-external-parsers.xml
        Hide
        gagravarr Nick Burch added a comment -

        How does this compare to our own Ray Gauss II's long-standing Tika EXIFTool work at https://github.com/Alfresco/tika-exiftool/ ?

        Show
        gagravarr Nick Burch added a comment - How does this compare to our own Ray Gauss II 's long-standing Tika EXIFTool work at https://github.com/Alfresco/tika-exiftool/ ?
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Hey Nick Burch great question. Looking at Ray Gauss II's tool (which I had never seen before), here are the differences I see:

        1. The one I just wrote makes pure use of ExternParser's config file; Ray calls it programmatically (which is the only reason it worked before the fix I just committed yesterday)
        2. Ray maps the metadata fields; mine just takes the names that EXIFTool gives us.
        3. Ray's is a hefty chunk of Java code; mine is ~10 lines in a config file.

        I'm happy to discuss how these things come together if there is an easy way for that to happen and for someone to maintain it (along with the Wiki docs I wrote). I just want Tika and EXIFTool to work together.

        Show
        chrismattmann Chris A. Mattmann added a comment - Hey Nick Burch great question. Looking at Ray Gauss II 's tool (which I had never seen before), here are the differences I see: 1. The one I just wrote makes pure use of ExternParser's config file; Ray calls it programmatically (which is the only reason it worked before the fix I just committed yesterday) 2. Ray maps the metadata fields; mine just takes the names that EXIFTool gives us. 3. Ray's is a hefty chunk of Java code; mine is ~10 lines in a config file. I'm happy to discuss how these things come together if there is an easy way for that to happen and for someone to maintain it (along with the Wiki docs I wrote). I just want Tika and EXIFTool to work together.
        Hide
        gagravarr Nick Burch added a comment -

        Generally we like to ensure that metadata is consistent between file formats, so that end users can have generic code not needing to worry about which exact format or parser they're dealing with, and also consistent between versions of Tika or the parser libraries/programs. As such, I do think that we need the mapping somewhere, be that code or config.

        Show
        gagravarr Nick Burch added a comment - Generally we like to ensure that metadata is consistent between file formats, so that end users can have generic code not needing to worry about which exact format or parser they're dealing with, and also consistent between versions of Tika or the parser libraries/programs. As such, I do think that we need the mapping somewhere, be that code or config.
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Hmm, I think one way to solve this then would be to have the ExternalParser have a section like so:

        <aliases>
          <metadata key="foo" alias="bar"/>
          <metadata key="foo2" alias="bar2"/>
        </aliases>
        

        Then, this could be looked up and used to ensure consistent metadata across Parsers. I've suggested this in TIKA-1640 and will implement it ASAP.

        Show
        chrismattmann Chris A. Mattmann added a comment - Hmm, I think one way to solve this then would be to have the ExternalParser have a section like so: <aliases> <metadata key= "foo" alias= "bar" /> <metadata key= "foo2" alias= "bar2" /> </aliases> Then, this could be looked up and used to ensure consistent metadata across Parsers. I've suggested this in TIKA-1640 and will implement it ASAP.
        Hide
        tallison@mitre.org Tim Allison added a comment - - edited

        Chris A. Mattmann, thank you for adding this capability. The aliasing mod is important.

        More broadly, do you happen to know what file formats/metadata keys we'd get with the EXIFTool that we aren't currently pulling?

        It looks like you've chosen to trigger it for only the following now:

        video/avi
        video/mpeg
        video/x-msvideo
        video/mp4
        

        Ray Gauss II, clearly you saw the same need... Do you have any documentation on the benefits over what we had? Thank you!

        Show
        tallison@mitre.org Tim Allison added a comment - - edited Chris A. Mattmann , thank you for adding this capability. The aliasing mod is important. More broadly, do you happen to know what file formats/metadata keys we'd get with the EXIFTool that we aren't currently pulling? It looks like you've chosen to trigger it for only the following now: video/avi video/mpeg video/x-msvideo video/mp4 Ray Gauss II , clearly you saw the same need... Do you have any documentation on the benefits over what we had? Thank you!

          People

          • Assignee:
            chrismattmann Chris A. Mattmann
            Reporter:
            chrismattmann Chris A. Mattmann
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development