[SOLR-1902] Tika no longer properly extracts content in Solr - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.2, 3.1, 4.0-ALPHA
Component/s: contrib - Solr Cell (Tika extraction)
Labels:
None

Description

See http://www.lucidimagination.com/search/document/2ca3fe953038a54f/problem_with_pdf_upgrading_cell#22360c8261801f24

It appears that since the upgrade to Tika 0.7, Tika is now selecting an EmptyParser when uploading docs, which then outputs an empty XHTML representation. Still, it's strange that the tests pass.

Attachments

SOLR1902_patch_to_141.txt
27/Jul/10 22:10
8 kB
Tommaso Teofili

Issue Links

Add Link

is blocked by

TIKA-419 Allow parser lookup from a custom class loader

Resolved

Delete this link

is related to

SOLR-2101 TikaEntityProcessor does not extract files- does not pick parser correctly

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Grant Ingersoll

Reporter:: Grant Ingersoll

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/May/10 22:22

Updated:: 30/Mar/11 15:45

Resolved:: 05/Aug/10 14:28

Agile

View on Board

Tika no longer properly extracts content in Solr

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment