Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1716

Tika Server's recursive JSON output from /rmeta different than tika-app -J output

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11
    • Component/s: cli, server
    • Labels:
      None

      Description

      Over in Tika Python, we've received a request for exposing the XHTML output that Tika provides. I noticed that in TikaJAXRS that the JSON output from /rmeta which Tika Python uses is different from tika-app's -J command. For example, see GrobidJournalParser. I'm not sure they should be different. Maybe they should. But it would be nice to at least provide maybe X:TIKA:XHTMLContent or something like that in /rmeta the same way that Tika-app -J provides.

        Activity

        Hide
        mahesh3 Mahesh added a comment -

        +1 for this

        Show
        mahesh3 Mahesh added a comment - +1 for this
        Hide
        muthu Muthupandi K added a comment -

        +1

        Show
        muthu Muthupandi K added a comment - +1
        Hide
        rajpravin Rajpravin added a comment -

        +1

        Show
        rajpravin Rajpravin added a comment - +1
        Hide
        divya4 divya added a comment -

        +1

        Show
        divya4 divya added a comment - +1
        Hide
        tallison@apache.org Tim Allison added a comment -

        Should we add a parameter in the call to enable text vs. xhtml? I agree that /rmeta should be the same as default -J (note that with -J you can specify the content with -t -h or default=-x).

        Show
        tallison@apache.org Tim Allison added a comment - Should we add a parameter in the call to enable text vs. xhtml? I agree that /rmeta should be the same as default -J (note that with -J you can specify the content with -t -h or default=-x).
        Hide
        tallison@mitre.org Tim Allison added a comment -

        How about:

        1. Switch default handler type to xml
        2. Allow user to specify handler type via PathParam or QueryParam

        If this is ok, any preference for path or query param? We've been using pathparams elsewhere.

        Show
        tallison@mitre.org Tim Allison added a comment - How about: Switch default handler type to xml Allow user to specify handler type via PathParam or QueryParam If this is ok, any preference for path or query param? We've been using pathparams elsewhere.
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        fantastic idea. I'm going to implement the fix you suggest above, Tim. My preference is for @PathParam.

        Show
        chrismattmann Chris A. Mattmann added a comment - fantastic idea. I'm going to implement the fix you suggest above, Tim. My preference is for @PathParam.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        K. I have patch nearly ready. Will commit tomorrow

        Show
        tallison@mitre.org Tim Allison added a comment - K. I have patch nearly ready. Will commit tomorrow
        Hide
        tallison@mitre.org Tim Allison added a comment -

        r1698329

        Let me know if there are any surprises.

        Show
        tallison@mitre.org Tim Allison added a comment - r1698329 Let me know if there are any surprises.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-trunk-jdk1.7 #846 (See https://builds.apache.org/job/tika-trunk-jdk1.7/846/)
        TIKA-1716 change default /rmeta content handler to xml and allow users to specify which content handler to use for content (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1698329)

        • /tika/trunk/CHANGES.txt
        • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/builders/DefaultContentHandlerFactoryBuilder.java
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/sax/BasicContentHandlerFactory.java
        • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java
        • /tika/trunk/tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #846 (See https://builds.apache.org/job/tika-trunk-jdk1.7/846/ ) TIKA-1716 change default /rmeta content handler to xml and allow users to specify which content handler to use for content (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1698329 ) /tika/trunk/CHANGES.txt /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/builders/DefaultContentHandlerFactoryBuilder.java /tika/trunk/tika-core/src/main/java/org/apache/tika/sax/BasicContentHandlerFactory.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java /tika/trunk/tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Thanks Tim Allison I'll give this a try in #65

        Show
        chrismattmann Chris A. Mattmann added a comment - Thanks Tim Allison I'll give this a try in #65
        Hide
        chrismattmann Chris A. Mattmann added a comment - - edited

        Tim in #65 and in #67 I've gone ahead and implemented it. Works great!

        Show
        chrismattmann Chris A. Mattmann added a comment - - edited Tim in #65 and in #67 I've gone ahead and implemented it. Works great!

          People

          • Assignee:
            tallison@mitre.org Tim Allison
            Reporter:
            chrismattmann Chris A. Mattmann
          • Votes:
            4 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development