Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1722

Tika methods that accept a File needlessly convert it to a URL

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.11
    • core
    • None
    • Patch

    Description

      The following methods:

      • Tika.detect(File)
      • Tika.parse(File)
      • Tika.parseToString(File)

      Convert the given File to a URL and use the corresponding overloaded method that accepts a URL.
      This seems like a shortcut, but essentially does the following:

      1. Converts the file to a URI
      2. Converts the URI to a URL
      3. Calls TikaInputStream.get(URL, Metadata), which then performs the following special handling:
      4. Checks if the protocol is "file"
      5. Tries to convert the URL (back) to a URI
      6. Creates a File around the URI
      7. Checks if file.isFile()
      8. Calls TikaInputStream.get(File, Metadata)

      The special handling in TikaInputStream.get(URL/URI) is a good optimization for in-the-wild file resources, but for internal uses it can be skipped - making Tika call TikaInputStream.get(File, Metadata) directly.

      Attachments

        1. TIKA-1722.patch
          2 kB
          Yaniv Kunda

        Activity

          People

            jukkaz Jukka Zitting
            kunda Yaniv Kunda
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: