Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3783

Filename detection misses when a # and several . are in a filename

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.4.1
    • detector
    • None
    • java 8
      TIKA-1928

    Description

      NameDetector.detect() strip any fragments after #, but not only after extension, when filename contains dots and hashes.

      Example: 'ABC#192.168.0.1#2.xxx' will be stripped to 'ABC#192.168.0.1'.

      There are filenames that contains dots not only in extension.

      Should change line:

      int dot = name.indexOf('.')
      

      https://github.com/apache/tika/blob/a52e4d153e950077f7fdedadcc5d75604fe2563d/tika-core/src/main/java/org/apache/tika/detect/NameDetector.java#L119

      to:

      int dot = name.lastIndexOf('.');
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            nogaav Alexander
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: