Uploaded image for project: 'Directory ApacheDS'
  1. Directory ApacheDS
  2. DIRSERVER-1965

An Index should speed up searches starting with '*'

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-M16
    • Fix Version/s: 2.0.0-M18
    • Component/s: ldap
    • Labels:
      None
    • Environment:
      NA

      Description

      As of now an index will speed up searches for an exact match string and a substring ending with a "*".

      It does NOT currently speed up searches starting with a "*", which is what we need to be implemented.

      Example: in our unified messaging application - when we receive an incoming call - we have to find a user whose telephone number ends with the digits signaled to us by the telephone network.
      Let's say a user has a telephone number +49(777)12345678. The telephone network only signals its extension 678. We thus search for *678 to get a list of possible users and by applying some extra magic we are able to pick the right one.
      The problem is: we need to do this fast, because taking the call depends on it.

        Activity

        Hide
        akiran Kiran Ayyagari added a comment -

        This fix gave significant performance gain.

        Show
        akiran Kiran Ayyagari added a comment - This fix gave significant performance gain.
        Hide
        akiran Kiran Ayyagari added a comment -

        We might want to evaluate the pros and cons of such an approach.

        Emmanuel Lecharny yes, this will slow down the add operation though.

        Show
        akiran Kiran Ayyagari added a comment - We might want to evaluate the pros and cons of such an approach. Emmanuel Lecharny yes, this will slow down the add operation though.
        Hide
        akiran Kiran Ayyagari added a comment -

        Committed a fix here http://svn.apache.org/r1611363 without which the above optimization doesn't work.

        Show
        akiran Kiran Ayyagari added a comment - Committed a fix here http://svn.apache.org/r1611363 without which the above optimization doesn't work.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Some other LDAP servers are indexing triplets. For instance, a sentence like :
        "Hello World!" will be indexed using the following triplets :
        'hel', 'ell', 'llo', 'lo ', 'o w'...

        If we have a N letters sentence, we will create N-3 entries in the index. This is very expensive. OTOH, it allows all the kind of searches for substring.

        We might want to evaluate the pros and cons of such an approach.

        Show
        elecharny Emmanuel Lecharny added a comment - Some other LDAP servers are indexing triplets. For instance, a sentence like : "Hello World!" will be indexed using the following triplets : 'hel', 'ell', 'llo', 'lo ', 'o w'... If we have a N letters sentence, we will create N-3 entries in the index. This is very expensive. OTOH, it allows all the kind of searches for substring. We might want to evaluate the pros and cons of such an approach.
        Hide
        akiran Kiran Ayyagari added a comment -

        I don't think this index will be helpful in cases like "abc" , and this is the case I have tested with.

        Show
        akiran Kiran Ayyagari added a comment - I don't think this index will be helpful in cases like " abc " , and this is the case I have tested with.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        The fix Kiran applied is just an optimization : it avoids a full scan, using the indexed attribute.

        However, we can do something better in this specific case : build a revert index for "*abc" kind of substring. That should not be a real problem nor a complex modification, and the speed up would be really huge.

        Show
        elecharny Emmanuel Lecharny added a comment - The fix Kiran applied is just an optimization : it avoids a full scan, using the indexed attribute. However, we can do something better in this specific case : build a revert index for "*abc" kind of substring. That should not be a real problem nor a complex modification, and the speed up would be really huge.
        Hide
        akiran Kiran Ayyagari added a comment -

        Committed an optimization here http://svn.apache.org/r1611344.

        Show
        akiran Kiran Ayyagari added a comment - Committed an optimization here http://svn.apache.org/r1611344 .

          People

          • Assignee:
            akiran Kiran Ayyagari
            Reporter:
            Ernie4711 Ernst Bech
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development