Tika
  1. Tika
  2. TIKA-915

Image geodata being rounded to integers

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2
    • Fix Version/s: 1.3
    • Component/s: parser
    • Labels:
      None

      Description

      This was initially reported as an Alfresco issue, https://issues.alfresco.com/jira/browse/ALF-13004, but is actually a Tika problem. It seems that for some images, the geo metadata is being incorrectly rounded to an integer:

      $ tika --metadata 2012-02-19\ 16.43.29.jpg | grep --text geo
      geo:lat: 51.0
      geo:long: -1.0

      The image was actually taken at (as extracted by exiftool)

      $ exiftool 2012-02-19\ 16.43.29.jpg | grep GPS
      ....
      GPS Altitude : 295 m Above Sea Level
      GPS Date/Time : 2012:02:20 16:44:22Z
      GPS Latitude : 51 deg 34' 32.74" N
      GPS Longitude : 1 deg 34' 4.39" W
      GPS Position : 51 deg 34' 32.74" N, 1 deg 34' 4.39" W

      The sample file for this example is available at <https://issues.alfresco.com/jira/secure/attachment/29236/2012-02-19+16.43.29.jpg>. We do have the OK to use the photo in a test suite, but it's possibly a bit big as-is so we may need to resize it whilst preserving the exif data for a unit test.

      1. problem_jpeg_geo_test.diff
        7 kB
        Ray Gauss II
      2. testJPEG_GEO_2.jpg
        20 kB
        Ray Gauss II

        Issue Links

          Activity

          Hide
          Ray Gauss II added a comment -

          Moved the decimal formatting from the JpegParserTest to the GeotagHandler in r1367205.

          Show
          Ray Gauss II added a comment - Moved the decimal formatting from the JpegParserTest to the GeotagHandler in r1367205.
          Hide
          Ray Gauss II added a comment -

          Reopening this as the metadata-extractor library is adding false precision and we should be rounding its result.

          Show
          Ray Gauss II added a comment - Reopening this as the metadata-extractor library is adding false precision and we should be rounding its result.
          Hide
          Ray Gauss II added a comment -

          Resolved by r1366967

          Show
          Ray Gauss II added a comment - Resolved by r1366967
          Hide
          Ray Gauss II added a comment -

          Resolved in r1366967

          Show
          Ray Gauss II added a comment - Resolved in r1366967
          Hide
          Emmanuel Hugonnet added a comment -

          The patch for Tika has already been pushed : https://issues.apache.org/jira/browse/TIKA-811

          Show
          Emmanuel Hugonnet added a comment - The patch for Tika has already been pushed : https://issues.apache.org/jira/browse/TIKA-811
          Hide
          Ray Gauss II added a comment -

          I ended up pushing version 2.6.2 of the Drew Noakes metadata-extractor library to Sonatype myself [1] and it has been synced to central [2]. I'll hopefully have some time in the next few weeks to work on refactoring Tika for the changes in the library since 2.4.0-beta-1.

          [1] http://code.google.com/p/metadata-extractor/issues/detail?id=39#c15
          [2] http://search.maven.org/#browse%7C930506482

          Show
          Ray Gauss II added a comment - I ended up pushing version 2.6.2 of the Drew Noakes metadata-extractor library to Sonatype myself [1] and it has been synced to central [2] . I'll hopefully have some time in the next few weeks to work on refactoring Tika for the changes in the library since 2.4.0-beta-1. [1] http://code.google.com/p/metadata-extractor/issues/detail?id=39#c15 [2] http://search.maven.org/#browse%7C930506482
          Hide
          Emmanuel Hugonnet added a comment -

          Be careful there seems to be some incompatibilities between the xmpcore in the maven repositories and the one used to compile metadata-extractor 2.6.2.
          https://code.google.com/p/metadata-extractor/issues/detail?id=55

          Show
          Emmanuel Hugonnet added a comment - Be careful there seems to be some incompatibilities between the xmpcore in the maven repositories and the one used to compile metadata-extractor 2.6.2. https://code.google.com/p/metadata-extractor/issues/detail?id=55
          Hide
          Ray Gauss II added a comment -

          Great news. Mr. Noakes is in the process of getting things setup to push to Maven Central repo.

          He hit a snag with Adobe's XMPCore not being available but I pinged Jörg Ehrlich who informed me that Adobe was already working on making that artifact available, and it's live now.

          Show
          Ray Gauss II added a comment - Great news. Mr. Noakes is in the process of getting things setup to push to Maven Central repo. He hit a snag with Adobe's XMPCore not being available but I pinged Jörg Ehrlich who informed me that Adobe was already working on making that artifact available, and it's live now.
          Hide
          Ray Gauss II added a comment -

          Still no response from Mr. Noakes but he has just released two new versions: https://groups.google.com/forum/?fromgroups#!topic/metadata-extractor-announce/EhveUDBb78o

          I've tried once more via the google project dev list. If there's no response by next week I'll begin the procedure to push it myself.

          Show
          Ray Gauss II added a comment - Still no response from Mr. Noakes but he has just released two new versions: https://groups.google.com/forum/?fromgroups#!topic/metadata-extractor-announce/EhveUDBb78o I've tried once more via the google project dev list. If there's no response by next week I'll begin the procedure to push it myself.
          Hide
          Ray Gauss II added a comment -

          I've emailed Mr. Noakes.

          Show
          Ray Gauss II added a comment - I've emailed Mr. Noakes.
          Hide
          Nick Burch added a comment -

          If need be, we can ask for an updated version to be loaded into Maven Central. I believe that the best practice would be to ask Drew if he'd mind uploading the new version himself. If he's not interested in doing that, then you can ask to package and upload it

          https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide

          (It takes a couple of days to get set up, so don't expect to rush it!)

          Show
          Nick Burch added a comment - If need be, we can ask for an updated version to be loaded into Maven Central. I believe that the best practice would be to ask Drew if he'd mind uploading the new version himself. If he's not interested in doing that, then you can ask to package and upload it https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide (It takes a couple of days to get set up, so don't expect to rush it!)
          Hide
          Ray Gauss II added a comment -

          I've investigated the latest release, 2.5.0-RC3, of the Drew Noakes library and the bit of code that handles the geo extraction and parsing has in fact been updated and works with this particular image file.

          Unfortunately there are two issues in moving to that release:

          1) 2.4.0-beta-1 (the version we currently use) is still the latest available in maven central repository.

          2) Major refactoring has been done on the Noakes side, and while I have completed much of that work locally there's still a fair amount to be done.

          I can complete that work on issue 2 if there's consensus on when and how we solve issue 1.

          Show
          Ray Gauss II added a comment - I've investigated the latest release, 2.5.0-RC3, of the Drew Noakes library and the bit of code that handles the geo extraction and parsing has in fact been updated and works with this particular image file. Unfortunately there are two issues in moving to that release: 1) 2.4.0-beta-1 (the version we currently use) is still the latest available in maven central repository. 2) Major refactoring has been done on the Noakes side, and while I have completed much of that work locally there's still a fair amount to be done. I can complete that work on issue 2 if there's consensus on when and how we solve issue 1.
          Hide
          Nick Burch added a comment -

          Thanks for the test Ray, I've committed it (disabled!) in r1336225. We can enable it once it's all fixed upstream

          Show
          Nick Burch added a comment - Thanks for the test Ray, I've committed it (disabled!) in r1336225. We can enable it once it's all fixed upstream
          Hide
          Nick Burch added a comment - - edited

          While I can see some users wanting to call out to exiftool to do the metadata parsing, many will want a pure Java solution. I believe the Drew Noakes library is the best pure Java one available under a suitable license, and so we'll need to stick with it.

          If you have some time, would you be able to look into doing a bug report to Drew Noakes (ideally with a unit test, and maybe even a patch!)

          Show
          Nick Burch added a comment - - edited While I can see some users wanting to call out to exiftool to do the metadata parsing, many will want a pure Java solution. I believe the Drew Noakes library is the best pure Java one available under a suitable license, and so we'll need to stick with it. If you have some time, would you be able to look into doing a bug report to Drew Noakes (ideally with a unit test, and maybe even a patch!)
          Hide
          Ray Gauss II added a comment -

          Attached is a patch which includes a test which demonstrates the failure and a smaller version of the example file.

          Unfortunately the patch also contains some of the changes from TIKA-859 in the test class so you can ignore those.

          Also, the grant of license only applies to the patch, license to use the image was obtained by Nick.

          Show
          Ray Gauss II added a comment - Attached is a patch which includes a test which demonstrates the failure and a smaller version of the example file. Unfortunately the patch also contains some of the changes from TIKA-859 in the test class so you can ignore those. Also, the grant of license only applies to the patch, license to use the image was obtained by Nick.
          Hide
          Ray Gauss II added a comment -

          Unfortunately this looks like an issue with the Drew Noakes library, possibly in this file: http://code.google.com/p/metadata-extractor/source/browse/trunk/Source/com/drew/metadata/exif/GpsDescriptor.java

          Shall I investigate further or are there plans to move away from this library?

          Show
          Ray Gauss II added a comment - Unfortunately this looks like an issue with the Drew Noakes library, possibly in this file: http://code.google.com/p/metadata-extractor/source/browse/trunk/Source/com/drew/metadata/exif/GpsDescriptor.java Shall I investigate further or are there plans to move away from this library?

            People

            • Assignee:
              Ray Gauss II
              Reporter:
              Nick Burch
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development