Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1239

Using Spring and Tika together. Need to extract the content and metadata.

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Closed
    • Priority: Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: general, metadata, parser
    • Labels:
      None

      Description

      I need to use spring with Tika. Is it thread safe to use the following injected from bean context. I am injecting parseContext, handler and parser into my class TikaImpl.
      ================
      <bean name="parseContext" class="org.apache.tika.parser.ParseContext"></bean>
      <bean name="parser" class="org.apache.tika.parser.AutoDetectParser"></bean>
      <bean name="handler" class="org.xml.sax.helpers.DefaultHandler"></bean>

      <bean id="tikaService" class="com.intech.tika.TikaImpl">
      <property name="parseContext" ref="parseContext"></property>
      <property name="parser" ref="parser"></property>
      <property name="handler" ref="handler"></property>
      <property name="resourcesize"><value>10485760</value></property>
      </bean>
      ===============
      In my class I have 3 methods 1. To retrieve metadata 2. to retrieve content 3. to retrieve both.

      So for 1. Retrieve metadata, I am using:
      parser.parse(stream, handler,
      metadata, parseContext)
      2. To retrieve the content, i am using:
      Tika tika = new Tika();
      tika.setMaxStringLength(resourcesize);
      String content = tika.parseToString(stream);
      3. To retrieve both: I am using:
      BodyContentHandler bodyContentHandler = new BodyContentHandler(resourcesize);
      Metadata metadata = new Metadata();
      parser.parse(TikaInputStream.get(stream), bodyContentHandler, metadata, parseContext);

      Question is:
      Is my approach thread safe? Introduced 3 methods, thinking that just getting metadata from the first method is faster than the 3rd method.

      Need your suggestion badly. Thank you in advance.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              iyersudheshna@gmail.com sudheshna iyer
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: