Issue Details (XML | Word | Printable)

Key: LUCENE-671
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Won't Fix
Priority: Minor Minor
Assignee: Unassigned
Reporter: Chris
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Hashtable based Document

Created: 14/Sep/06 03:18 PM   Updated: 12/Jan/08 11:22 PM
Return to search
Component/s: Index, Search
Affects Version/s: 1.9, 2.0.0
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File Licensed for inclusion in ASF works HashDocument.java 2006-09-14 03:21 PM Chris 11 kB
Java Source File Licensed for inclusion in ASF works TestBenchDocuments.java 2006-09-14 03:22 PM Chris 6 kB

Resolution Date: 12/Jan/08 11:22 PM


 Description  « Hide
I've attached a Document based on a hashtable and a performance test case. It performs better in most cases (all but enumeration by my measurement), but likely uses a larger memory footprint. The Document testcase will fail since it accesses the "fields" variable directly and gets confused when it's not the list it expected it to be.

If nothing else we would be interested in at least being able to extend Document, which is currently declared final. (Anyone know the performance gains on declaring a class final?) Currently we have to maintain a copy of lucene which has methods and classes definalized and overriden.

There are other classes as well that could be declared non-final (Fieldable comes to mind) since it's possible to make changes for project specific situations in those aswell but that's off-topic.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Chris added a comment - 14/Sep/06 06:54 PM
After some digging: http://www-128.ibm.com/developerworks/java/library/j-jtp1029.html

If these classes are declared final for performance, it might be worth reconsidering. I know of at least one other development group that has to maintain their own lucene tree for the same reason. (Both of us have had to make changes in FieldsWriter to store extra information about the field)

re: (Fieldable comes to mind)
Yup I meant field, and I'll look into Abstract Field, thanks Mike.


Doug Cutting added a comment - 14/Sep/06 07:27 PM
The final declaration is not for performance. It is to keep folks from thinking, if they subclass Document, that instances of their subclass will be returned to them in search results. To make Documents fully-subclassible one would need to make their serialization extensible.

Karl Wettin added a comment - 14/Sep/06 07:59 PM

Cutting:
> To make Documents fully-subclassible one would need to make their serialization extensible.

I find this a bit strange considering RAMDirectory was not made serializable until a few months ago.. But then it might just have been something preemptive. Or perhaps people serialize documents without adding them to the index? That too sounds quite fishy.

I'm all for definalizing Term and Document as this is something required for my issue 550 index.


Chris added a comment - 15/Sep/06 12:59 PM
> It is to keep folks from thinking, if they subclass Document, that instances of their subclass will be returned to them in search results. To make Documents fully-subclassible one would need to make their serialization extensible.

Ahhh, that makes sense to me, and I think providing a method for informing the rest of lucene which versions of various classes to use is probably more trouble than it's worth. We'll just maintain our own tree then.

Thanks