Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1528

Add an OverrideDetector that overrides other detectors

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Later
    • None
    • None
    • None
    • None

    Description

      While working on TIKA-1511, I found a need to bypass our current detection mechanism. I think that there are other use cases for this. The idea is that a client or a tika-internal call wants to specify the Content-Type for a document and bypass the regular mime detection chain.

      We currently have the TypeDetector that returns the "Content-Type" as specified in the Metadata, but there are two deficiencies in using that class for this purpose:

      • Content-Type is ambiguous, currently, when it comes into a Parser or Detector, it could be used as a hint or as a direction. I'd like the OverrideDetector to use a different metadata key from our usual "Content-Type.
      • The ordering of the TypeDetector is based on alphabetic order of its class name. I'd like the OverrideDetector to be run first and then short circuit/bypass the other detectors.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: