Issue Details (XML | Word | Printable)

Key: LUCENE-1166
Type: New Feature New Feature
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Grant Ingersoll
Reporter: Thomas Peuss
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

A tokenfilter to decompose compound words

Created: 06/Feb/08 11:08 AM   Updated: 18/May/08 01:21 PM
Return to search
Component/s: Analysis
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-05-16 11:32 AM Thomas Peuss 106 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-04-30 02:26 PM Thomas Peuss 106 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-04-30 09:17 AM Thomas Peuss 105 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-04-24 10:11 AM Thomas Peuss 99 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-03-29 11:04 AM Thomas Peuss 90 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-03-25 12:56 PM Thomas Peuss 90 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-03-25 12:49 PM Thomas Peuss 91 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-03-03 04:35 PM Thomas Peuss 90 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-02-14 04:22 PM Thomas Peuss 85 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-02-12 11:09 AM Thomas Peuss 76 kB
Text File Licensed for inclusion in ASF works CompoundTokenFilter.patch 2008-02-06 11:08 AM Thomas Peuss 71 kB
XML File de.xml 2008-02-06 11:10 AM Thomas Peuss 48 kB
File hyphenation.dtd 2008-02-06 11:11 AM Thomas Peuss 3 kB
Issue Links:
Reference
 

Lucene Fields: Patch Available
Resolution Date: 16/May/08 12:28 PM


 Description  « Hide
A tokenfilter to decompose compound words you find in many germanic languages (like German, Swedish, ...) into single tokens.

An example: Donaudampfschiff would be decomposed to Donau, dampf, schiff so that you can find the word even when you only enter "Schiff".

I use the hyphenation code from the Apache XML project FOP (http://xmlgraphics.apache.org/fop/) to do the first step of decomposition. Currently I use the FOP jars directly. I only use a handful of classes from the FOP project.

My question now:
Would it be OK to copy this classes over to the Lucene project (renaming the packages of course) or should I stick with the dependency to the FOP jars? The FOP code uses the ASF V2 license as well.

What do you think?



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Thomas Peuss made changes - 06/Feb/08 11:09 AM
Field Original Value New Value
Attachment CompoundTokenFilter.patch [ 12374854 ]
Thomas Peuss made changes - 06/Feb/08 11:10 AM
Attachment de.xml [ 12374855 ]
Thomas Peuss made changes - 06/Feb/08 11:11 AM
Attachment hyphenation.dtd [ 12374856 ]
Thomas Peuss made changes - 12/Feb/08 11:09 AM
Attachment CompoundTokenFilter.patch [ 12375343 ]
Thomas Peuss made changes - 14/Feb/08 04:22 PM
Attachment CompoundTokenFilter.patch [ 12375610 ]
Thomas Peuss made changes - 03/Mar/08 04:35 PM
Attachment CompoundTokenFilter.patch [ 12376987 ]
Thomas Peuss made changes - 25/Mar/08 12:43 PM
Attachment CompoundTokenFilter.patch [ 12378562 ]
Thomas Peuss made changes - 25/Mar/08 12:48 PM
Attachment CompoundTokenFilter.patch [ 12378562 ]
Thomas Peuss made changes - 25/Mar/08 12:49 PM
Attachment CompoundTokenFilter.patch [ 12378564 ]
Thomas Peuss made changes - 25/Mar/08 12:56 PM
Attachment CompoundTokenFilter.patch [ 12378565 ]
Thomas Peuss made changes - 29/Mar/08 11:04 AM
Attachment CompoundTokenFilter.patch [ 12378856 ]
Grant Ingersoll made changes - 24/Apr/08 01:04 AM
Priority Major [ 3 ] Minor [ 4 ]
Grant Ingersoll made changes - 24/Apr/08 01:06 AM
Assignee Grant Ingersoll [ gsingers ]
Thomas Peuss made changes - 24/Apr/08 10:11 AM
Attachment CompoundTokenFilter.patch [ 12380832 ]
Grant Ingersoll made changes - 30/Apr/08 01:17 AM
Status Open [ 1 ] In Progress [ 3 ]
Thomas Peuss made changes - 30/Apr/08 09:17 AM
Attachment CompoundTokenFilter.patch [ 12381174 ]
Thomas Peuss made changes - 30/Apr/08 02:26 PM
Attachment CompoundTokenFilter.patch [ 12381187 ]
Thomas Peuss made changes - 16/May/08 11:32 AM
Attachment CompoundTokenFilter.patch [ 12382162 ]
Grant Ingersoll made changes - 16/May/08 12:28 PM
Resolution Fixed [ 1 ]
Lucene Fields [New, Patch Available] [Patch Available]
Status In Progress [ 3 ] Resolved [ 5 ]
Thomas Peuss made changes - 18/May/08 01:21 PM
Link This issue relates to LUCENE-1287 [ LUCENE-1287 ]