Tika
  1. Tika
  2. TIKA-614

Support hdf5 data file with file extension *.h5.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.9
    • Fix Version/s: 0.10
    • Component/s: config
    • Labels:
    • Environment:

      MacOS JDK 1.6

      Description

      The HDF5 data files sometimes come with file extension *.h5. Add to tika-mimetypes.xml under tika-core package to support this type of hdf5 files.

        Issue Links

          Activity

          Hide
          Chris A. Mattmann added a comment -

          Hi Richard, I opened up TIKA-862 to discuss this. Let's move the conversation there.

          Show
          Chris A. Mattmann added a comment - Hi Richard, I opened up TIKA-862 to discuss this. Let's move the conversation there.
          Hide
          Chris A. Mattmann added a comment -

          Hi Richard, thanks for this! I'm going to open up a new issue and then link back to this one.

          Show
          Chris A. Mattmann added a comment - Hi Richard, thanks for this! I'm going to open up a new issue and then link back to this one.
          Hide
          Richard Yu added a comment -

          We were trying to extract metadata from our h5 file (i.e. with JPSS extension). We ran the following command line:
          [ryu@localhost hdf5extractor]$ java -jar tika-app-1.0.jar -m \
          > /usr/local/staging/products/h5/SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5
          Content-Encoding: windows-1252
          Content-Length: 22187952
          Content-Type: text/plain
          resourceName: SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5
          [ryu@localhost hdf5extractor]$

          We noticed that the content type in text/plain and only 4 lines of output (i.e. we expected al lots of metadata).

          Let me know if more information is needed. Thanks!

          Richard

          Show
          Richard Yu added a comment - We were trying to extract metadata from our h5 file (i.e. with JPSS extension). We ran the following command line: [ryu@localhost hdf5extractor] $ java -jar tika-app-1.0.jar -m \ > /usr/local/staging/products/h5/SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5 Content-Encoding: windows-1252 Content-Length: 22187952 Content-Type: text/plain resourceName: SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5 [ryu@localhost hdf5extractor] $ We noticed that the content type in text/plain and only 4 lines of output (i.e. we expected al lots of metadata). Let me know if more information is needed. Thanks! Richard
          Hide
          Chris A. Mattmann added a comment -

          Hi Richard, thanks for your comment. Can you please suggest what problem you are having, specifically, with a log message and/or command output? Feel free to open up a new JIRA ticket and I'll take a look. Thanks!

          Show
          Chris A. Mattmann added a comment - Hi Richard, thanks for your comment. Can you please suggest what problem you are having, specifically, with a log message and/or command output? Feel free to open up a new JIRA ticket and I'll take a look. Thanks!
          Hide
          Richard Yu added a comment -

          We are trying to use Tika to extract metadata from our h5 file (i.e. JPSS h5 file). We ran into some problem and not sure if Tika works for all h5 or not (i.e. with or without extesions)?

          Show
          Richard Yu added a comment - We are trying to use Tika to extract metadata from our h5 file (i.e. JPSS h5 file). We ran into some problem and not sure if Tika works for all h5 or not (i.e. with or without extesions)?
          Hide
          Chris A. Mattmann added a comment -

          Hi Cynthia, thanks for the patch! I've applied it without modification in r1080262. Note that the fix version should be an unreleased version, so I've marked it as 1.0. Thank you!

          Show
          Chris A. Mattmann added a comment - Hi Cynthia, thanks for the patch! I've applied it without modification in r1080262. Note that the fix version should be an unreleased version, so I've marked it as 1.0. Thank you!
          Hide
          Cynthia L Wong added a comment -

          Changes made to the tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

          Show
          Cynthia L Wong added a comment - Changes made to the tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

            People

            • Assignee:
              Chris A. Mattmann
              Reporter:
              Cynthia L Wong
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development