Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-475

RAMDirectory(Directory dir, boolean closeDir) constructor uses memory inefficiently.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None

      Description

      recently I found that RAMDirectory(Directory dir, boolean closeDir) constructor uses memory inefficiently.
      files from source index are read entirely intro memory as single byte array which is after all is thrown away. And if I want to load my 200M optimized, compound format index to memory for faster search I should give JVM at least 400Mb memory limit. For larger indexes this can be an issue.

      I've attached patch how to solve this problem.

      1. RamDirectory.diff
        1 kB
        Volodymyr Bychkoviak
      2. RamDirectory2.diff
        1 kB
        Volodymyr Bychkoviak

        Activity

        Hide
        vbychkoviak Volodymyr Bychkoviak added a comment -

        Oh, I didn't described my solution...
        Solution is simle: use 1k buffer to copy files from source index to ram files

        Show
        vbychkoviak Volodymyr Bychkoviak added a comment - Oh, I didn't described my solution... Solution is simle: use 1k buffer to copy files from source index to ram files
        Hide
        hossman Hoss Man added a comment -

        I'm no expert on File I/O, so i don't know if this is/isn't a good change to make, but i deplore constants:

        If this patch is a good idea, then i'd like to suggest that instead of a static constant for the buffer size, a new constructor arg be added specifying the buffer size, with the default (ie: no value, or "0" specified) being to use the old behavior (ie: a buffer the same size as the file)

        Show
        hossman Hoss Man added a comment - I'm no expert on File I/O, so i don't know if this is/isn't a good change to make, but i deplore constants: If this patch is a good idea, then i'd like to suggest that instead of a static constant for the buffer size, a new constructor arg be added specifying the buffer size, with the default (ie: no value, or "0" specified) being to use the old behavior (ie: a buffer the same size as the file)
        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        I like the patch and find it very helpful if one tries to load larger indices into RAMDirectory.

        Hoss Man,

        why do you would like to have a new constructor to adjust the internal buffer size. I do not see any reason to make the buffersize configurable from outside. The tests i made with different sizes didn't show any difference on performace or disk usage. The new implementation would be similar to BufferedIndexOutput where the internal buffer size couldn't be changed either. Do i miss something ?

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - I like the patch and find it very helpful if one tries to load larger indices into RAMDirectory. Hoss Man, why do you would like to have a new constructor to adjust the internal buffer size. I do not see any reason to make the buffersize configurable from outside. The tests i made with different sizes didn't show any difference on performace or disk usage. The new implementation would be similar to BufferedIndexOutput where the internal buffer size couldn't be changed either. Do i miss something ?
        Hide
        hossman Hoss Man added a comment -

        Using different buffer sizes may not make a big difference in your performacne tests – but that doesn't mean it won't mke a differnece for other people running with different heap sizees, or on different hardware.

        As i understand it, the point of this patch is to give people the ability to use memory more efficiently, and I applaud that – but why not go all out and let users control he buffer size so they can tune it to their hearts content?

        Show
        hossman Hoss Man added a comment - Using different buffer sizes may not make a big difference in your performacne tests – but that doesn't mean it won't mke a differnece for other people running with different heap sizees, or on different hardware. As i understand it, the point of this patch is to give people the ability to use memory more efficiently, and I applaud that – but why not go all out and let users control he buffer size so they can tune it to their hearts content?
        Hide
        cutting Doug Cutting added a comment -

        Why not just use BufferedIndexOutput.BUFFER_SIZE, a constant declared elsewhere and already used for all RAMDirectory buffers? That way you're not introducing any new constants and disturbing the constant averse...

        And I disagree that the default behaviour should be as before: there's no good reason to buffer the entire file. That was a bug that this patch fixes.

        Show
        cutting Doug Cutting added a comment - Why not just use BufferedIndexOutput.BUFFER_SIZE, a constant declared elsewhere and already used for all RAMDirectory buffers? That way you're not introducing any new constants and disturbing the constant averse... And I disagree that the default behaviour should be as before: there's no good reason to buffer the entire file. That was a bug that this patch fixes.
        Hide
        alex+news@olmisoft.com Alexey Panchenko added a comment -

        byte[] buf = new byte[READ_BUFFER_SIZE];
        should be moved before the for-loop, so the single buffer can be used to load all the files.

        Show
        alex+news@olmisoft.com Alexey Panchenko added a comment - byte[] buf = new byte [READ_BUFFER_SIZE] ; should be moved before the for-loop, so the single buffer can be used to load all the files.
        Hide
        vbychkoviak Volodymyr Bychkoviak added a comment -

        second version of patch which uses BufferedIndexOutput.BUFFER_SIZE as buffer size and same buffer for all files.

        Show
        vbychkoviak Volodymyr Bychkoviak added a comment - second version of patch which uses BufferedIndexOutput.BUFFER_SIZE as buffer size and same buffer for all files.
        Hide
        kuhn Jiri Kuhn added a comment -

        Wouldn't it be nice to have special method to copy two directories? Like:

        public abstract class Directory {
        ...
        public static void copyDirectory(Directory from, Directory to) throws IOException

        { // patched code here }

        }

        Because I have different problem how to flush RAMDirectory to FSDirectory. The copy operation is more general to be only in RAMDirectory.

        Show
        kuhn Jiri Kuhn added a comment - Wouldn't it be nice to have special method to copy two directories? Like: public abstract class Directory { ... public static void copyDirectory(Directory from, Directory to) throws IOException { // patched code here } } Because I have different problem how to flush RAMDirectory to FSDirectory. The copy operation is more general to be only in RAMDirectory.
        Hide
        vbychkoviak Volodymyr Bychkoviak added a comment -

        agree

        Show
        vbychkoviak Volodymyr Bychkoviak added a comment - agree
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        super minor nit:

        toRead = Math.min(len-readCount, BufferedIndexOutput.BUFFER_SIZE)

        is easier on my eyes than

        + int toRead = readCount + BufferedIndexOutput.BUFFER_SIZE > len ? len - readCount : BufferedIndexOutput.BUFFER_SIZE;

        Show
        yseeley@gmail.com Yonik Seeley added a comment - super minor nit: toRead = Math.min(len-readCount, BufferedIndexOutput.BUFFER_SIZE) is easier on my eyes than + int toRead = readCount + BufferedIndexOutput.BUFFER_SIZE > len ? len - readCount : BufferedIndexOutput.BUFFER_SIZE;
        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        Volodymyr,

        thanks for the fix. It's reviewed and commited.

        Bernhard

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - Volodymyr, thanks for the fix. It's reviewed and commited. Bernhard

          People

          • Assignee:
            Unassigned
            Reporter:
            vbychkoviak Volodymyr Bychkoviak
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development