Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.3.0
    • Component/s: studio-ldapbrowser
    • Labels:
      None
    • Environment:
      Win2k workstation, java 1.6.0_05

      Description

      Windows servers have a default server side limit of 1000 returned objects. AFAIK the normal way of handling this is to detect that a search returns an incomplete result set and make further requests to span the full result set.

      There appears to be no such capability in dirstudio which makes searching 15K users extremely messy.

        Activity

        Hide
        Jim Birch added a comment -

        Stefan

        Your analysis of what I did is right. Thanks for the memory info. If I need to dump any big lumps of data I cab go via CSV, or bump the heap right up for the job. I've got a few Gb to play with on my desktop

        Thanks for your efforts. DirStudio is really well thought out and put together, a pleasure to use. All that at only V1.3! It's has been a brilliant tool for data analysis side of the project I'm working on: identity management and provisioning of users in AD and various other applications using HR data exports. Keep up the good work.

        Regards, Jim

        Show
        Jim Birch added a comment - Stefan Your analysis of what I did is right. Thanks for the memory info. If I need to dump any big lumps of data I cab go via CSV, or bump the heap right up for the job. I've got a few Gb to play with on my desktop Thanks for your efforts. DirStudio is really well thought out and put together, a pleasure to use. All that at only V1.3! It's has been a brilliant tool for data analysis side of the project I'm working on: identity management and provisioning of users in AD and various other applications using HR data exports. Keep up the good work. Regards, Jim
        Show
        Stefan Seelmann added a comment - Fixed here: http://svn.apache.org/viewcvs?view=rev&rev=691070
        Hide
        Stefan Seelmann added a comment -

        I think I was able to reproduce what you done: You first searched 13K entries and they were displayed in the search result editor, then you exported them, right?

        Perfoming a search with 10K results and displaying them in the search result editor costs about 50MB memory. The search results are cached in memory as long as your connection is opened. 5kB for one entry sounds much, it consists of our internal object model (entry, DN, RDN, attributes, values, parent-child relationship) and the UI objects to display our object model (Table, Rows, Column, Fonts)

        Exporting 10K entries to Excel costs even about 50MB memory. We use the Apache POI library for that and we must create the excel file in memory.

        Exporting to LDIF or CSV is cheaper because each entry received from the server is immediately streamed to the file. (With LDIF the CPU usage is too high, need to check that...)

        So 40MB (Studio/Eclipse footprint) + 50MB (10000 search results) + 50MB (Excel export) is more the 128MB default heap size.

        So what I could suggest (if appropriate)

        • you already increased heap memory
        • only perform small searches within Studio, use the count limit and/or paged search
        • If you need to export large data
        • use CSV (you already do) or LDIF
        • if you need excel first close all connections (to flush caches), open the right connection, run the export without performing a search

        Hm, perhaps we should start a new process for each search and export, like g**gle does with chr*me

        Show
        Stefan Seelmann added a comment - I think I was able to reproduce what you done: You first searched 13K entries and they were displayed in the search result editor, then you exported them, right? Perfoming a search with 10K results and displaying them in the search result editor costs about 50MB memory. The search results are cached in memory as long as your connection is opened. 5kB for one entry sounds much, it consists of our internal object model (entry, DN, RDN, attributes, values, parent-child relationship) and the UI objects to display our object model (Table, Rows, Column, Fonts) Exporting 10K entries to Excel costs even about 50MB memory. We use the Apache POI library for that and we must create the excel file in memory. Exporting to LDIF or CSV is cheaper because each entry received from the server is immediately streamed to the file. (With LDIF the CPU usage is too high, need to check that...) So 40MB (Studio/Eclipse footprint) + 50MB (10000 search results) + 50MB (Excel export) is more the 128MB default heap size. So what I could suggest (if appropriate) you already increased heap memory only perform small searches within Studio, use the count limit and/or paged search If you need to export large data use CSV (you already do) or LDIF if you need excel first close all connections (to flush caches), open the right connection, run the export without performing a search Hm, perhaps we should start a new process for each search and export, like g**gle does with chr*me
        Hide
        Stefan Seelmann added a comment -

        Hi Jim,

        glad to hear that it works.

        The pb with the Excel export is that we need to hold all data in memory. However 1540 items should not cause memory issues, I'll investigate. Could you please tell me your heap size settings before and after increasing it?

        Thanks,
        Stefan

        Show
        Stefan Seelmann added a comment - Hi Jim, glad to hear that it works. The pb with the Excel export is that we need to hold all data in memory. However 1540 items should not cause memory issues, I'll investigate. Could you please tell me your heap size settings before and after increasing it? Thanks, Stefan
        Hide
        Jim Birch added a comment -

        I'm on sun j2se 1.6.0_07 on Windows XP. There's a doc on setting the heap size and other stuff here http://directory.apache.org/studio/faqs.html (covers Linux too.) Basically it's an ini file along side the DirStudio exe.

        Is that what you want?

        Show
        Jim Birch added a comment - I'm on sun j2se 1.6.0_07 on Windows XP. There's a doc on setting the heap size and other stuff here http://directory.apache.org/studio/faqs.html (covers Linux too.) Basically it's an ini file along side the DirStudio exe. Is that what you want?
        Hide
        Emmanuel Lecharny added a comment -

        Which JVM are you using ? With which flags ?

        Show
        Emmanuel Lecharny added a comment - Which JVM are you using ? With which flags ?
        Hide
        Jim Birch added a comment - - edited

        Thanks Stefan,

        Browsing the structure works with no problems.

        (CORRECTION) Searching is ok too, eg Needed to set paged search and bump the heap size

        I also noticed that my Excel export slowed to a crawl then runs out of Java heap space after around 1540 items out of a 13K list. There's a warning on that so to be expected. I could bump the heap space but CSV works ok - probably a better way to go anyway.

        (MORE) After increasing heap space I got to about 20k entries on the Excel export before a heap space error. C'est la Vie.

        I'm happy for you to close this issue. It's working for me.

        Regards
        Jim

        Show
        Jim Birch added a comment - - edited Thanks Stefan, Browsing the structure works with no problems. (CORRECTION) Searching is ok too, eg Needed to set paged search and bump the heap size I also noticed that my Excel export slowed to a crawl then runs out of Java heap space after around 1540 items out of a 13K list. There's a warning on that so to be expected. I could bump the heap space but CSV works ok - probably a better way to go anyway. (MORE) After increasing heap space I got to about 20k entries on the Excel export before a heap space error. C'est la Vie. I'm happy for you to close this issue. It's working for me. Regards Jim
        Hide
        Stefan Seelmann added a comment -

        Hi Jim, I ust added support for paged results control. Feel free to test the nightly build and report if it fits your needs or if you need improvements.

        Show
        Stefan Seelmann added a comment - Hi Jim, I ust added support for paged results control. Feel free to test the nightly build and report if it fits your needs or if you need improvements.
        Hide
        Jim Birch added a comment -

        Any chance of getting this issue bumped into a close release? I'm kinda in love with DirStudio - she's beautiful - but this is putting pressure on the relationship

        The alternate workaround of bumping the server side limit is considered very bad practice: This setting will apply to across all controllers in the Active Directory domain so leaves a lot of targets open to a DoS attack in a corporate situation. Everything I've seen warns against it, eg: http://searchwindowsserver.techtarget.com/tip/0,289483,sid68_gci1265206,00.html and some management tools automatically flag a server side limit above 1000 an a problem.

        I guess there are a lot of other AD ldap users who could use the tool if it has paged search support. There's a few here.

        I'm not sure about the ApacheDS design philosophy, but paged searched might be a good idea for ApacheDS too.

        The code changes looks well-contained and pretty easy to implement. It would make the LDAP implementation more complete. There's some sample code here, if it helps: http://java.sun.com/docs/books/tutorial/jndi/newstuff/paged-results.html.

        I wouldn't want to mess with the code myself but I'm happy to do some AD testing here.

        Show
        Jim Birch added a comment - Any chance of getting this issue bumped into a close release? I'm kinda in love with DirStudio - she's beautiful - but this is putting pressure on the relationship The alternate workaround of bumping the server side limit is considered very bad practice: This setting will apply to across all controllers in the Active Directory domain so leaves a lot of targets open to a DoS attack in a corporate situation. Everything I've seen warns against it, eg: http://searchwindowsserver.techtarget.com/tip/0,289483,sid68_gci1265206,00.html and some management tools automatically flag a server side limit above 1000 an a problem. I guess there are a lot of other AD ldap users who could use the tool if it has paged search support. There's a few here. I'm not sure about the ApacheDS design philosophy, but paged searched might be a good idea for ApacheDS too. The code changes looks well-contained and pretty easy to implement. It would make the LDAP implementation more complete. There's some sample code here, if it helps: http://java.sun.com/docs/books/tutorial/jndi/newstuff/paged-results.html . I wouldn't want to mess with the code myself but I'm happy to do some AD testing here.
        Hide
        Emmanuel Lecharny added a comment -

        There is a 'paged control' RFC (http://www.ietf.org/rfc/rfc2696.txt) which is implemented by AD.

        We just have to implement it in Studio (an add an option to set it up).

        Show
        Emmanuel Lecharny added a comment - There is a 'paged control' RFC ( http://www.ietf.org/rfc/rfc2696.txt ) which is implemented by AD. We just have to implement it in Studio (an add an option to set it up).

          People

          • Assignee:
            Stefan Seelmann
            Reporter:
            Jim Birch
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development