[LUCENE-2731] HyphenationCompoundWordTokenFilter fails to load DTD in Crimson parser (JDK 1.4) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Reopened
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.9.4
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New, Patch Available

Description

HyphenationCompoundWordTokenFilter loads the DTD in its XML parser from memory by supplying EntityResolver. In Java 1.4 (affects Lucene 2.9, but also later versions if not Apache Xerces is used as XML parser) this does not work, because Cromson does not even ask the entity resolver, if no base URI is known. As the hyphenation file is loaded from Reader/InputStream no base URI is known. Crimson needs at least a non-null systemId to proceed.

This patch (Lucene 2.9 only) fakes this by supplying a fake systemId to the InputSource.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2731.patch
31/Oct/10 10:04
1 kB
Uwe Schindler

Issue Links

is part of

LUCENE-2732 Fix charset problems in XML loading in HyphenationCompoundWordTokenFilter (also Solr's loader from schema)

Closed

Activity

People

Assignee:: Uwe Schindler

Reporter:: Uwe Schindler

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 31/Oct/10 10:01

Updated:: 27/Aug/24 15:35

Resolved:: 31/Oct/10 14:51