Issue Details (XML | Word | Printable)

Key: DIRSERVER-629
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Unassigned
Reporter: Emmanuel Lecharny
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Directory ApacheDS

Improve performance for search requests

Created: 02/Jun/06 12:59 AM   Updated: 25/Aug/06 11:02 AM
Return to search
Component/s: None
Affects Version/s: 1.0-RC3
Fix Version/s: 1.0-RC4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works apacheds-SEARCH.log 2006-06-02 01:00 AM Emmanuel Lecharny 29 kB

Resolution Date: 25/Aug/06 11:02 AM


 Description  « Hide
Search requests are pathological long when the server is loaded with entries. I have a server with 10000 entries, and I do a search for a random user. It takes around 50 ms to find the user (20 searches per second, maximum).

This is due to the fact that, when we have a lot of entries, the search look for entries which DN are all parsed (cf attached log). The parser is synchronized and takes around 0,5 ms to parse a DN, and the entries to be parse are 100 (cf attached log again), so the 50 ms are spent parsing, parsing and parsing ...



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Emmanuel Lecharny added a comment - 02/Jun/06 01:00 AM
A log in DEBUG mode for a simple search

Emmanuel Lecharny added a comment - 02/Jun/06 01:56 PM
After having profiled the server (10 random search request), I saw that DnParser is responsible for 90% of the CPU usage (10 808 ms / 12 124 ms).

Almost all those DnParse calls (893) are done in the DnComparator.compare() method, which was already optimized months ago.

We parse DN before compare them becazuse we want to be sure that they are normalized when we compare them. Obviously, this is overkilling, because the DN should already have been normalized on the upper layer, so n the backend, we should have stored a normalized form of the DN (and of couse the user provided DN, too), so this parsing should not be necessary.

The question now is : should we store a DN as an object (with RDNs, attribute and values), or as a String? The advantage of stroring a tree is that the comparison is very fast, but we may also need to store the serialized object.

We could also store an integer representing the hashcode of the dn, and if two DN are equals (because their HashCode are equals), then we can do a String comparison of their normalized form to insure that they are reallly equal.

Just some thought ...

Alex Karasulu added a comment - 03/Jun/06 01:39 AM
This is horrible. Here's what I propose we do:

(1) Branch the 1.0 branch (new branch is optimizaion) and the shared-ldap code.
(2) Replace all Strings that pump in a DN into the backend interface with your new LdapDN class which has both the
      normalized DN and the user provided DN.
(3) Change comparator on ndn index to use a String.equals() for now. This is the easiest way to get an immediate
      impact. The comparator for the updn must still be a normalizing since there is nothing we can do about it. However we
      do not use the UPDN for searching based on scope so it will not hurt us.
(4) Rerun perfromance metrics and check the improvement

If these enhancements do not solve this problem we have to re-evaulate the default backend design.

WDYT?

Emmanuel Lecharny added a comment - 25/Aug/06 11:02 AM
This has been fixed in may/june. Some more work could be done, but the search is now very fast. I think that the number of searches we had was also limited by the naggle algorithm wich was not disabled.

Emmanuel Lecharny added a comment - 25/Aug/06 11:02 AM
no more necessary