Tika
  1. Tika
  2. TIKA-590

Create facility for deeper introspection of media files

    Details

    • Type: Wish Wish
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: metadata
    • Labels:
      None

      Description

      This feature would allow applications to dig deeper into files to define meta-data that is not presented as a tag in the file. For example a file that has no duration information could with a little more work provide this missing information. The idea is to let the API user make a difference between data that is quick to retrieve and data that is slower to retrieve because of the extra processing needed to get that information.

        Activity

        Hide
        Nick Burch added a comment -

        I'm not sure how that would fit into the current model. However, something similar that might work is setting something in the parse context to indicate how much work you'd like the parsers to do

        A rough idea would be something like:
        public enum ParserExtraWorkLevel

        { NONE, LIMITED, FULL }

        parseContext.set(ParserExtraWorkLevel.class, ParserExtraWorkLevel.FULL)
        parser.parse(stream, handler, metadata, parseContext);

        Then inside the parser you could check for the extra work level, and do more if requested.

        It's probably worth coming up with a concrete case first though, and when we have a patch that introduces some optional "expensive" work to a parser we can decide on the best way forward.

        Show
        Nick Burch added a comment - I'm not sure how that would fit into the current model. However, something similar that might work is setting something in the parse context to indicate how much work you'd like the parsers to do A rough idea would be something like: public enum ParserExtraWorkLevel { NONE, LIMITED, FULL } parseContext.set(ParserExtraWorkLevel.class, ParserExtraWorkLevel.FULL) parser.parse(stream, handler, metadata, parseContext); Then inside the parser you could check for the extra work level, and do more if requested. It's probably worth coming up with a concrete case first though, and when we have a patch that introduces some optional "expensive" work to a parser we can decide on the best way forward.
        Hide
        Andre-John Mas added a comment -

        Some cases I see:

        • hash
        • duration of song or movie
        • language tracks in movie

        I have looked into doing this with the mp3 file format, but in doing so I see it would require a second pass over the inputstream and in certain cases would need to make use of other libraries. For this reason I wondering whether an extension architecture would be needed? Imagine using a native library on certain platforms such as libvlc.

        Show
        Andre-John Mas added a comment - Some cases I see: hash duration of song or movie language tracks in movie I have looked into doing this with the mp3 file format, but in doing so I see it would require a second pass over the inputstream and in certain cases would need to make use of other libraries. For this reason I wondering whether an extension architecture would be needed? Imagine using a native library on certain platforms such as libvlc.
        Hide
        Nick Burch added a comment -

        TikaInputStream can help with the case of needing to do multiple passes over the stream.

        For the libvlc vs java case, you'd probably want something like:

        • A libvlc powered movie parser (mixture of java and native code)
        • A pure java "switching" parser - eg will use the normal java parser if ParserExtraWorkLevel is none or limited, and will use the vlc one for FULL assuming it loads ok
          You could then choose to use the switching+vlc one or not by including / not including the jar
        Show
        Nick Burch added a comment - TikaInputStream can help with the case of needing to do multiple passes over the stream. For the libvlc vs java case, you'd probably want something like: A libvlc powered movie parser (mixture of java and native code) A pure java "switching" parser - eg will use the normal java parser if ParserExtraWorkLevel is none or limited, and will use the vlc one for FULL assuming it loads ok You could then choose to use the switching+vlc one or not by including / not including the jar

          People

          • Assignee:
            Unassigned
            Reporter:
            Andre-John Mas
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development