Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2910

Text extraction using Tika command line and Tika server differs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.21
    • None
    • parser

    Description

      When extracting TXT from the very same XML file using either Tika command line utility or the Tika in server mode, the results differ.

      It looks as if PCDATA in deeper nested XML structures are just ignored and only an empty line is returned.

      I assume both use the same base code. Are there any default settings that may differ or can be set?

       

      Attachments

        1. CorpusP_25471990.xml
          47 kB
          Walter

        Activity

          People

            Unassigned Unassigned
            akit Walter
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: