Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2537

FSDirectory.copy() impl is unsafe

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1, 4.0-ALPHA
    • core/store
    • None
    • New, Patch Available

    Description

      There are a couple of issues with it:

      1. FileChannel.transferFrom documents that it may not copy the number of bytes requested, however we don't check the return value. So need to fix the code to read in a loop until all bytes were copied..
      2. When calling addIndexes() w/ very large segments (few hundred MBs in size), I ran into the following exception (Java 1.6 – Java 1.5's exception was cryptic):
        Exception in thread "main" java.io.IOException: Map failed
            at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
            at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
            at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
            at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
            at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
        Caused by: java.lang.OutOfMemoryError: Map failed
            at sun.nio.ch.FileChannelImpl.map0(Native Method)
            at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
            ... 7 more
        

      I changed the impl to something like this:

      long numWritten = 0;
      long numToWrite = input.size();
      long bufSize = 1 << 26;
      while (numWritten < numToWrite) {
        numWritten += output.transferFrom(input, numWritten, bufSize);
      }
      

      And the code successfully adds the indexes. This code uses chunks of 64MB, however that might be too large for some applications, so we definitely need a smaller one. The question is how small so that performance won't be affected, and it'd be great if we can let it be configurable, however since that API is called by other API, such as addIndexes, not sure it's easily controllable.

      Also, I read somewhere (can't remember now where) that on Linux the native impl is better and does copy in chunks. So perhaps we should make a Linux specific impl?

      Attachments

        1. LUCENE-2537.patch
          2 kB
          Shai Erera
        2. LUCENE-2537.patch
          10 kB
          Shai Erera
        3. FileCopyTest.java
          4 kB
          Shai Erera

        Activity

          People

            shaie Shai Erera
            shaie Shai Erera
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: