[LUCENE-3690] JFlex-based HTMLStripCharFilter replacement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5, 4.0-ALPHA
Fix Version/s: 3.6, 4.0-ALPHA
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New, Patch Available

Description

A JFlex-based HTMLStripCharFilter replacement would be more performant and easier to understand and maintain.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-3690-handle-utf16-surrogates.patch
23/Jan/12 07:30
13 kB
Steven Rowe
LUCENE-3690.patch
12/Jan/12 05:40
229 kB
Steven Rowe
LUCENE-3690.patch
12/Jan/12 07:00
230 kB
Steven Rowe
LUCENE-3690.patch
13/Jan/12 19:04
924 kB
Steven Rowe
LUCENE-3690.patch
16/Jan/12 09:45
2.36 MB
Steven Rowe
LUCENE-3690.patch
22/Jan/12 05:17
2.41 MB
Steven Rowe
JFlexHTMLStripCharFilterWarcTest.java
16/Jan/12 09:45
4 kB
Steven Rowe
jenkins_test.patch
22/Jan/12 15:57
1 kB
Robert Muir
HTMLStripCharFilterWarcTest.java
16/Jan/12 09:45
4 kB
Steven Rowe
BaselineWarcTest.java
16/Jan/12 09:45
3 kB
Steven Rowe

Issue Links

requires

SOLR-882 HTMLStripReader improvement - padding corrected for hexadecimal entities, option not to emit padding at all added

Closed

supercedes

LUCENE-2208 Token div exceeds length of provided text sized 4114

Closed

SOLR-42 Highlighting problems with HTMLStripWhitespaceTokenizerFactory

Closed

Activity

People

Assignee:: Steven Rowe

Reporter:: Steven Rowe

Votes:: 3 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Jan/12 05:36

Updated:: 28/Aug/22 13:05

Resolved:: 24/Jan/12 15:53