Directory ApacheDS
  1. Directory ApacheDS
  2. DIRSERVER-1459

Adding members to a groupOfNames results in polynomial increase in JDBM partition size

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.5.5
    • Fix Version/s: 1.5.6
    • Component/s: None
    • Labels:
      None
    • Environment:
      Any (tested on Linux and Mac OS X)

      Description

      I noticed a polynomial increase JDBM partition size and therefore disk usage when adding users to groups in my ApacheDS instance. The vast majority of the usage (95+% once you hit a couple thousand users) is in workingDirectory/partitionId/master.db

      Further testing showed that simply adding a user is linear, as one would expect, and as 'apacheds-tools capacity' confirms. It is only when a user is made a member of a group that the JDBM partition size shoots up.

      Example statistics:
      Add 16,000 users - JDBM partition size = ~70 megabytes
      Now add those same 16,000 users to a single group (all in the same group) - JDBM partition size = ~19 GIGABYTES

      I'll work to attach a test case and some more numbers from my tests

      1. screenshot-1.jpg
        17 kB
        Ben Hoyt
      2. DIRSERVER-1459.tar.gz
        7 kB
        Ben Hoyt

        Issue Links

          Activity

          Emmanuel Lecharny made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Kiran Ayyagari made changes -
          Link This issue incorporates DIRSERVER-1471 [ DIRSERVER-1471 ]
          Kiran Ayyagari made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Kiran Ayyagari added a comment -

          A temporary work around [1] has been applied till we figure out a right fix.
          [1] http://svn.apache.org/viewvc?rev=912634&view=rev

          Show
          Kiran Ayyagari added a comment - A temporary work around [1] has been applied till we figure out a right fix. [1] http://svn.apache.org/viewvc?rev=912634&view=rev
          Kiran Ayyagari made changes -
          Assignee Alex Karasulu [ akarasulu ] Kiran Ayyagari [ akiran ]
          Hide
          Emmanuel Lecharny added a comment -

          We may have a workaround in trunk. I have applied a patch proposed by Kiran, and after having injected 11 000 DN in a memberOf attribute, the MasterTable has reasonably grown to 3 Mb.

          However, it takes a hell of a time to inject data this way, and it takes more and more time when we add more and more entries. For instance, adding the first 100 DN cost only 400ms, but adding 100 DN when we already have 11 500 DNs costs more than 20 seconds.

          I have also applied a patch to partially fix this problem (without this patch, we would not talk about 20 seconds, but more probably 20 minutes !), but there are many inefficient operations done when modifying an entry (10 lookups are done, representing 2/3 of the time needed to proceed).

          We will first apply the two patches, and keep going on investigation.

          Show
          Emmanuel Lecharny added a comment - We may have a workaround in trunk. I have applied a patch proposed by Kiran, and after having injected 11 000 DN in a memberOf attribute, the MasterTable has reasonably grown to 3 Mb. However, it takes a hell of a time to inject data this way, and it takes more and more time when we add more and more entries. For instance, adding the first 100 DN cost only 400ms, but adding 100 DN when we already have 11 500 DNs costs more than 20 seconds. I have also applied a patch to partially fix this problem (without this patch, we would not talk about 20 seconds, but more probably 20 minutes !), but there are many inefficient operations done when modifying an entry (10 lookups are done, representing 2/3 of the time needed to proceed). We will first apply the two patches, and keep going on investigation.
          Hide
          Ben Hoyt added a comment -

          Thank you for the update, I appreciate it.

          Show
          Ben Hoyt added a comment - Thank you for the update, I appreciate it.
          Hide
          Emmanuel Lecharny added a comment -

          Hi Ben,

          as a coincidence, we discussed this exact issue just this morning. Alex is currently working this out, but it seems to be a bug in the backend we are using. We are really doing our best to determinate which part of the code is causing this issue, and have successfully been able to write a small piece of code that demonstrates the bug (out of the server).

          It will probably take a bit more time, I have no idea how much, but be sure that we are realizing how big is the bug, and we want to get it fix, and probably release another version before 2.0 as soon as the bug is fixed.

          Sorry for the delay... Life is not that simple :/

          Show
          Emmanuel Lecharny added a comment - Hi Ben, as a coincidence, we discussed this exact issue just this morning. Alex is currently working this out, but it seems to be a bug in the backend we are using. We are really doing our best to determinate which part of the code is causing this issue, and have successfully been able to write a small piece of code that demonstrates the bug (out of the server). It will probably take a bit more time, I have no idea how much, but be sure that we are realizing how big is the bug, and we want to get it fix, and probably release another version before 2.0 as soon as the bug is fixed. Sorry for the delay... Life is not that simple :/
          Hide
          Ben Hoyt added a comment -

          I know you guys were originally targeting near the end of January for this fix + others in 1.5.6. Is there something in the near future? I need to make a decision on ApacheDS vs. OpenLDAP for our larger installations based on this issue soon. Thanks for any updates and for looking at this issue.

          Show
          Ben Hoyt added a comment - I know you guys were originally targeting near the end of January for this fix + others in 1.5.6. Is there something in the near future? I need to make a decision on ApacheDS vs. OpenLDAP for our larger installations based on this issue soon. Thanks for any updates and for looking at this issue.
          Hide
          Emmanuel Lecharny added a comment -

          Just thinking loud here :

          can't we shortcut the problem by simply doing a delete/add operation instead of a modify? As Add and Delete on Jdbm proved to be done correctly without an increase of the database size ?

          Show
          Emmanuel Lecharny added a comment - Just thinking loud here : can't we shortcut the problem by simply doing a delete/add operation instead of a modify? As Add and Delete on Jdbm proved to be done correctly without an increase of the database size ?
          Hide
          Alex Karasulu added a comment -

          I've been working on this issue for some time and have found out exactly why it is resulting. First this is a general problem that occurs irregardless of uniqueMember attributes. It's occurring due to the inability of JDBM to reclaim blocks used by secondary BTree objects for storing large numbers of values for duplicate keys. The JdbmTable wrapper is designed to switch at some configurable threshold to begin using embedded BTrees to store values instead of serializing a TreeSet/ArraySet container data structure.

          When this happens and a nested BTree is used, the space used cannot be reclaimed by JDBM when the BTree is deleted. Plus modify operations for some reason are deleting and adding the values causing the data to be rewritten. This might be due to yet another issue all together in the design of the modify operation. This will need to be investigated. However the need to free blocks used by embedded secondary BTrees used for duplicate key value storage is a must.

          I am looking at JDBM right now to see how we can make sure this does in fact happen properly. This is not a small matter and is critical for proper operation. I will keep you informed as this progresses.

          Show
          Alex Karasulu added a comment - I've been working on this issue for some time and have found out exactly why it is resulting. First this is a general problem that occurs irregardless of uniqueMember attributes. It's occurring due to the inability of JDBM to reclaim blocks used by secondary BTree objects for storing large numbers of values for duplicate keys. The JdbmTable wrapper is designed to switch at some configurable threshold to begin using embedded BTrees to store values instead of serializing a TreeSet/ArraySet container data structure. When this happens and a nested BTree is used, the space used cannot be reclaimed by JDBM when the BTree is deleted. Plus modify operations for some reason are deleting and adding the values causing the data to be rewritten. This might be due to yet another issue all together in the design of the modify operation. This will need to be investigated. However the need to free blocks used by embedded secondary BTrees used for duplicate key value storage is a must. I am looking at JDBM right now to see how we can make sure this does in fact happen properly. This is not a small matter and is critical for proper operation. I will keep you informed as this progresses.
          Alex Karasulu made changes -
          Fix Version/s 1.5.6 [ 12314538 ]
          Fix Version/s 2.0.0-RC1 [ 12313387 ]
          Hide
          Alex Karasulu added a comment -

          Creating new release 1.5.6 just so we can get critical bugs like this one out. Shooting for January 31st depending on the bug parade.

          <OT>
          Very nice issue filing btw. I like the description and the code. Would be nice if everyone could submit issues like this. Hence why I want to react rapidly to get you a fix.
          </OT>

          Just to let others know what I am thinking is causing this is a low level bug in JdbmTable where the secondary BTree used for the values of duplicate key table entries is not being deleted. So a modify is deleting the key and orphaning the 2ndary table whose data stays in the jdbm file as a separate tree. It is not being properly cleaned up on LDAP entry modify operations and is especially impacting the master.db file. Would produce huge uniqueMember.db files if we indexed this attribute as well.

          The growth would look polynomia based on this hypothesis. I will update this issue to show which tests are being used after getting it into the repository.

          Show
          Alex Karasulu added a comment - Creating new release 1.5.6 just so we can get critical bugs like this one out. Shooting for January 31st depending on the bug parade. <OT> Very nice issue filing btw. I like the description and the code. Would be nice if everyone could submit issues like this. Hence why I want to react rapidly to get you a fix. </OT> Just to let others know what I am thinking is causing this is a low level bug in JdbmTable where the secondary BTree used for the values of duplicate key table entries is not being deleted. So a modify is deleting the key and orphaning the 2ndary table whose data stays in the jdbm file as a separate tree. It is not being properly cleaned up on LDAP entry modify operations and is especially impacting the master.db file. Would produce huge uniqueMember.db files if we indexed this attribute as well. The growth would look polynomia based on this hypothesis. I will update this issue to show which tests are being used after getting it into the repository.
          Alex Karasulu made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Alex Karasulu made changes -
          Assignee Alex Karasulu [ akarasulu ]
          Hide
          Emmanuel Lecharny added a comment -

          I have added two tests to demonstrate the problem

          Note : they are @Ignored, they must be activated to run them
          http://svn.apache.org/viewvc?rev=901814&view=rev

          Show
          Emmanuel Lecharny added a comment - I have added two tests to demonstrate the problem Note : they are @Ignored, they must be activated to run them http://svn.apache.org/viewvc?rev=901814&view=rev
          Hide
          Emmanuel Lecharny added a comment -

          This is not a problem with the Add operation. If I add 10 000 values to the attribute, the MasterDB size grows up to 1 Mb, no more.

          It's a problem with the Modify operation : adding 500 times a value make the Master.db growing up to 17 Mb.

          PS: I have created a new tests, no need to send me the SimpleRegression class.

          Show
          Emmanuel Lecharny added a comment - This is not a problem with the Add operation. If I add 10 000 values to the attribute, the MasterDB size grows up to 1 Mb, no more. It's a problem with the Modify operation : adding 500 times a value make the Master.db growing up to 17 Mb. PS: I have created a new tests, no need to send me the SimpleRegression class.
          Hide
          Ben Hoyt added a comment -

          That class is from commons-math. I added that as a dependency in the pom.xml inside the attached archive. I also tweaked the pom to up the memory for the test to 1GB. You could either run the test directly from the pom, or add commons-math to your classpath (and maybe bump your memory setting up however you are running it)

          Show
          Ben Hoyt added a comment - That class is from commons-math. I added that as a dependency in the pom.xml inside the attached archive. I also tweaked the pom to up the memory for the test to 1GB. You could either run the test directly from the pom, or add commons-math to your classpath (and maybe bump your memory setting up however you are running it)
          Hide
          Emmanuel Lecharny added a comment -

          Class SimpleRegression is missing in the submitted test...

          Show
          Emmanuel Lecharny added a comment - Class SimpleRegression is missing in the submitted test...
          Emmanuel Lecharny made changes -
          Fix Version/s 2.0.0-RC1 [ 12313387 ]
          Priority Major [ 3 ] Blocker [ 1 ]
          Hide
          Emmanuel Lecharny added a comment -

          Confirmed. Raised to blocker. Will work on it asap.

          Show
          Emmanuel Lecharny added a comment - Confirmed. Raised to blocker. Will work on it asap.
          Ben Hoyt made changes -
          Attachment DIRSERVER-1459.tar.gz [ 12430905 ]
          Hide
          Ben Hoyt added a comment -

          Attached is a JUnit test inside of a Maven project that exposes this problem. I used the 1.5.5 archetype that was created to allow for easy test case creation. DirServer1459Test.java

          Show
          Ben Hoyt added a comment - Attached is a JUnit test inside of a Maven project that exposes this problem. I used the 1.5.5 archetype that was created to allow for easy test case creation. DirServer1459Test.java
          Ben Hoyt made changes -
          Field Original Value New Value
          Attachment screenshot-1.jpg [ 12430809 ]
          Hide
          Ben Hoyt added a comment -

          Partition size on disk (y axis) per number of users who are also in a group (x axis).

          Show
          Ben Hoyt added a comment - Partition size on disk (y axis) per number of users who are also in a group (x axis).
          Ben Hoyt created issue -

            People

            • Assignee:
              Kiran Ayyagari
              Reporter:
              Ben Hoyt
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development