Lucene - Core
  1. Lucene - Core
  2. LUCENE-5164

Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Followup from LUCENE-5161:
      In former times we added the OOM cactching in NIOFSDir and SimpleFSDir because nobody understand why the OOM could happen on FileChannel.read() or SimpleFSDir.read(). By reading the Java code its easy to understand (it allocates direct buffers with same size as the requested length to read). As we have chunking now reduce to a few kilobytes it cannot happen anymore that we get spurious OOMs.

      In fact we might hide a real OOM! So we should remove it.

      I am also not sure if we should make chunk size configureable in FSDirectory at all! It makes no sense to me (it was in fact only added for people that hit the OOM to fine-tune).

      In my opinion we should remove the setter in trunk and keep it deprecated in 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

      1. LUCENE-5164.patch
        24 kB
        Uwe Schindler
      2. LUCENE-5164.patch
        24 kB
        Uwe Schindler
      3. LUCENE-5164.patch
        24 kB
        Uwe Schindler
      4. LUCENE-5164.patch
        19 kB
        Uwe Schindler
      5. LUCENE-5164.patch
        9 kB
        Uwe Schindler
      6. LUCENE-5164-4x.patch
        58 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          +1 to remove the OOM-catching and remove the setter completely from trunk!

          Show
          Robert Muir added a comment - +1 to remove the OOM-catching and remove the setter completely from trunk!
          Hide
          Uwe Schindler added a comment -

          We should also make SimpleIndexOutput use the chunk size, too. Because it uses the same code in the JDK (it allocates direct buffer before writing!)

          Show
          Uwe Schindler added a comment - We should also make SimpleIndexOutput use the chunk size, too. Because it uses the same code in the JDK (it allocates direct buffer before writing!)
          Hide
          Uwe Schindler added a comment - - edited

          Here is the code of RandomAccessFile's native part that explains why the OOM happens there: http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/dffaa68042cd/src/share/native/java/io/io_util.c

          In fact, when the byte[] is larger than 8 Kilobytes it mallocs a buffer. Otherwise it use a buffer from call stack. This malloc has the same effect as the direct buffer in FileChannel.read().

          So we really need the same chunking for write and set the chunk size to 8 Kilobytes W8hcih is done in Robert's previous commit), just not explicit! So we should insert 8192 as number into FSDirectory.

          Show
          Uwe Schindler added a comment - - edited Here is the code of RandomAccessFile's native part that explains why the OOM happens there: http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/dffaa68042cd/src/share/native/java/io/io_util.c In fact, when the byte[] is larger than 8 Kilobytes it mallocs a buffer. Otherwise it use a buffer from call stack. This malloc has the same effect as the direct buffer in FileChannel.read(). So we really need the same chunking for write and set the chunk size to 8 Kilobytes W8hcih is done in Robert's previous commit), just not explicit! So we should insert 8192 as number into FSDirectory.
          Hide
          Uwe Schindler added a comment - - edited

          Patch:

          • deprecates the readChunk size
          • uses the chunck size also for writing in FSIndexOutput
          • unfucks the read loops in Simple and NIO to be easier to understand (uses Math.min instead crazy ifs).
          • unfucks FSIndexOutput.close() and use IOUtils.
          Show
          Uwe Schindler added a comment - - edited Patch: deprecates the readChunk size uses the chunck size also for writing in FSIndexOutput unfucks the read loops in Simple and NIO to be easier to understand (uses Math.min instead crazy ifs). unfucks FSIndexOutput.close() and use IOUtils.
          Hide
          Uwe Schindler added a comment -

          More cleanups.

          Show
          Uwe Schindler added a comment - More cleanups.
          Hide
          Uwe Schindler added a comment -

          Fix wrong visibility of flushBuffer method in FSIndexOutput

          Show
          Uwe Schindler added a comment - Fix wrong visibility of flushBuffer method in FSIndexOutput
          Hide
          Uwe Schindler added a comment -

          Hide the extra FSIndexOutput constructor again, as it is only used to support deprecated chunkSize.

          Show
          Uwe Schindler added a comment - Hide the extra FSIndexOutput constructor again, as it is only used to support deprecated chunkSize.
          Hide
          Uwe Schindler added a comment -

          This is already a fixed buffer in base class!

          Show
          Uwe Schindler added a comment - This is already a fixed buffer in base class!
          Hide
          Uwe Schindler added a comment -

          New patch:

          • Remves the chunking in FSIndexOutput
          • Lower BufferedIndexOutput.DEFAULT_BUFFER_SIZE to 8192 bytes: This prevents RandomAccessFile from malloc on every write!
          • Unfuck the read loops in SimpleFSDir and NIOFSDir
          • Use IOUtils when close FSIndexOutput
          Show
          Uwe Schindler added a comment - New patch: Remves the chunking in FSIndexOutput Lower BufferedIndexOutput.DEFAULT_BUFFER_SIZE to 8192 bytes: This prevents RandomAccessFile from malloc on every write! Unfuck the read loops in SimpleFSDir and NIOFSDir Use IOUtils when close FSIndexOutput
          Hide
          Uwe Schindler added a comment -

          In my opinion, the chunking should be done in the base class (BufferedIndexInput) already, so we an remove the chunking in NIOFS and SimpleFS. There is no reason to have the code here.

          I will work later on that.

          Show
          Uwe Schindler added a comment - In my opinion, the chunking should be done in the base class (BufferedIndexInput) already, so we an remove the chunking in NIOFS and SimpleFS. There is no reason to have the code here. I will work later on that.
          Hide
          Uwe Schindler added a comment -

          New patch:

          • The NIOFS read loop was further cleaned up and simplified by using the ByteBuffer tracking.
          • The setter/getter in FSDirectory are no no-ops (deprecated)
          • Every implementation has its own chunk size, which fits the underlying IO layer. For RandomAccessFile this is 8192 bytes

          I decided not to put the chunking into Buffered*, as it is still separate and complicates code of Buffered* even more.

          Show
          Uwe Schindler added a comment - New patch: The NIOFS read loop was further cleaned up and simplified by using the ByteBuffer tracking. The setter/getter in FSDirectory are no no-ops (deprecated) Every implementation has its own chunk size, which fits the underlying IO layer. For RandomAccessFile this is 8192 bytes I decided not to put the chunking into Buffered*, as it is still separate and complicates code of Buffered* even more.
          Hide
          Uwe Schindler added a comment -

          Improved test in TestDirectory that ensures if chunking is working correctly.

          This is now ready.

          Show
          Uwe Schindler added a comment - Improved test in TestDirectory that ensures if chunking is working correctly. This is now ready.
          Hide
          Uwe Schindler added a comment -

          Explicitely pass buffer size as CHUNK_SIZE to BufferedIndexOutput for FSDirectory.

          Show
          Uwe Schindler added a comment - Explicitely pass buffer size as CHUNK_SIZE to BufferedIndexOutput for FSDirectory.
          Hide
          Uwe Schindler added a comment -

          New patch again, this time with better reuse of NIOFS' ByteBuffer!

          Show
          Uwe Schindler added a comment - New patch again, this time with better reuse of NIOFS' ByteBuffer!
          Hide
          ASF subversion and git services added a comment -

          Commit 1512937 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1512937 ]

          LUCENE-5164: Fix default chunk sizes in FSDirectory to not be unnecessarily large (now 8192 bytes); also use chunking when writing to index files. FSDirectory#setReadChunkSize() is now deprecated and will be removed in Lucene 5.0

          Show
          ASF subversion and git services added a comment - Commit 1512937 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1512937 ] LUCENE-5164 : Fix default chunk sizes in FSDirectory to not be unnecessarily large (now 8192 bytes); also use chunking when writing to index files. FSDirectory#setReadChunkSize() is now deprecated and will be removed in Lucene 5.0
          Hide
          Uwe Schindler added a comment -

          Patch for 4.x (as the merging was complicated, because of many changes - Java 7)

          Show
          Uwe Schindler added a comment - Patch for 4.x (as the merging was complicated, because of many changes - Java 7)
          Hide
          ASF subversion and git services added a comment -

          Commit 1512949 from Uwe Schindler in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1512949 ]

          Merged revision(s) 1512937 from lucene/dev/trunk:
          LUCENE-5164: Fix default chunk sizes in FSDirectory to not be unnecessarily large (now 8192 bytes); also use chunking when writing to index files. FSDirectory#setReadChunkSize() is now deprecated and will be removed in Lucene 5.0

          Show
          ASF subversion and git services added a comment - Commit 1512949 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1512949 ] Merged revision(s) 1512937 from lucene/dev/trunk: LUCENE-5164 : Fix default chunk sizes in FSDirectory to not be unnecessarily large (now 8192 bytes); also use chunking when writing to index files. FSDirectory#setReadChunkSize() is now deprecated and will be removed in Lucene 5.0
          Hide
          ASF subversion and git services added a comment -

          Commit 1512951 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1512951 ]

          LUCENE-5164: Remove deprecated stuff in trunk.

          Show
          ASF subversion and git services added a comment - Commit 1512951 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1512951 ] LUCENE-5164 : Remove deprecated stuff in trunk.
          Hide
          Uwe Schindler added a comment -

          Thanks Robert and Grant for the fruitful discussions!

          Show
          Uwe Schindler added a comment - Thanks Robert and Grant for the fruitful discussions!
          Hide
          Adrien Grand added a comment -

          4.5 release -> bulk close

          Show
          Adrien Grand added a comment - 4.5 release -> bulk close

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development