Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1459

Tika service wrong Content-Type

    XMLWordPrintableJSON

Details

    Description

      I noticed that the standard behaviour of the Tika extractor connector is to replace the existing "Content-Type" metadata by the one it founds. This behaviour is not implemented in the Tika service connector which just adds a new metadata entry instead of replacing the existing one. The consequence is that two values are available for the "Content-Type" metadata but only the first one is kept by the connector (which can also be considered as a bug ? this is the case for both the Tika extractor connector and the Tika service connector).
      So depending on the source connector, the resulting "Content-Type" may be wrong if for example the original provided one is "application/octet-stream"

      I will provide a patch for this bug

      Attachments

        1. CONNECTORS-1459.patch
          1 kB
          Julien Massiera

        Activity

          People

            kwright@metacarta.com Karl Wright
            julienFL Julien Massiera
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: