|
I've resolved the same problem today, and think it's the problem caused by Integer.parseInt(). For example, at the second line, Integer.parseInt( "20051004130501" ) gives an NumberFormatException with contained message "For input string: "20051004130501"".
So, I've made some changes of using Long.parseLong() rather than Integer.parseInt() to the classes related like FieldCache.java, FieldCacheImpl.java, and FieldSortedHitQueue.java. And now it sorts date values in yyyymmddhhmmsszzz like a charm! I can make some patches to address this issue right away, if there is anyone who needs it. And of course, the performance and memory issue should be tested thoroughly. And also, does anybody have a problem with sorting values in double, not float? Why not just use a string sort rather than an integer (or long) sort?
AFAIK, sorting as number is far more faster than string. And friendly with memory issue also.
Some developers want to use the value they've inserted in database, and thinks it's wasting disk space to save another 'same' field just for sorting to the database or lucene index while they can sort it with existing values. Thanks for resolving it. I didn't get your first note asking me to send you the Junit test. Sorry, if I didn't respond. I still can build the unit test, for my use, and send it here. Could you make a patch (a lucene.jar)? I will work on it this week, and test it.
Thanks. Etienne. Etienne, how are you specifying the sort? A string sort should work for this.
Cheolgoo, string sorting with no locale is faster than numeric sorting the first time you sort on a particular field. After the first time, string sorting will be the same speed as integer sorting, but faster than float sorting. The reason is that the strings are already sorted in the lucene index, so their index-order (ordinals) are compared. The string values themselves are never compared unless you are using MultiSearcher or if you specify a different Locale to sort by. String sorting does use more memory, since an additional array of the unique terms is kept. See FieldCache.StringIndex. Unless a significant amount of documents have unique values, a long[] in the fieldcache could take more room. Hi,
the way I do the sort is something like this : String sortingField = "date"; And the field is stored like this in the Index: public static void addFieldDate(Document luceneDoc, Date date, String field){ So it seems it is already a String that I stored? Etienne. > sorting = new Sort(sortingField, sorder);
Ahhh, that could be the problem. You aren't specifying what type to treat the field as, so it defaults to AUTO, meaning that lucene tries to figure out the field type at sort time by the first term for that field in the index. Use the Sort constructor that takes a SortField, and specify string sorting. Yep, I verified that AUTO sorting is your problem.
In FieldCacheImpl for an AUTO type, first Integer.parseInt() is tried, and if that fails, Float.parseFloat() is tried. For your case, parseInt() will fail because it's too big, but parseFloat() will work (but not with enough precision). Float.parseFloat("20051004130501")=2.00510045E13 Notice how two of the strings map to the same float when parsed as such. Closing. You can reopen if string sorting doesn't solve the problem for you.
Hi,
Thanks all, I think everything works well now. This way to make a sort is not mention in the Lucene In Action book. Maybe it should be a bit more detailled there. public Sort createSorting(ILuceneFilterParameterVO vo){ Thanks all. Etienne. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Please attach some code that demonstrates the probem.
Ideally this should be in the form of a JUnit Test, so it can easily be incorperated into the existing unit tests – but at a minimum people will need to see code for a standalone program with a main function that requires no input and builds up an index in either a RAMDirectory or in the current working directory and does a search/sort using the same method you are using in order to figure out what the problem might be.