Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1722

Tika methods that accept a File needlessly convert it to a URL

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11
    • Component/s: core
    • Labels:
      None
    • Flags:
      Patch

      Description

      The following methods:

      • Tika.detect(File)
      • Tika.parse(File)
      • Tika.parseToString(File)

      Convert the given File to a URL and use the corresponding overloaded method that accepts a URL.
      This seems like a shortcut, but essentially does the following:

      1. Converts the file to a URI
      2. Converts the URI to a URL
      3. Calls TikaInputStream.get(URL, Metadata), which then performs the following special handling:
      4. Checks if the protocol is "file"
      5. Tries to convert the URL (back) to a URI
      6. Creates a File around the URI
      7. Checks if file.isFile()
      8. Calls TikaInputStream.get(File, Metadata)

      The special handling in TikaInputStream.get(URL/URI) is a good optimization for in-the-wild file resources, but for internal uses it can be skipped - making Tika call TikaInputStream.get(File, Metadata) directly.

        Attachments

        1. TIKA-1722.patch
          2 kB
          Yaniv Kunda

          Activity

            People

            • Assignee:
              jukkaz Jukka Zitting
              Reporter:
              kunda Yaniv Kunda
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: