Lucene - Core
  1. Lucene - Core
  2. LUCENE-5678

Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a < 1 KiB patch.

      Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications.

      There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums.

      1. LUCENE-5678.patch
        23 kB
        Uwe Schindler
      2. LUCENE-5678.patch
        23 kB
        Uwe Schindler
      3. LUCENE-5678.patch
        23 kB
        Uwe Schindler
      4. LUCENE-5678.patch
        14 kB
        Uwe Schindler
      5. LUCENE-5678.patch
        14 kB
        Uwe Schindler
      6. LUCENE-5678.patch
        2 kB
        Uwe Schindler
      7. LUCENE-5678-4x.patch
        34 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment - - edited

          Very simple patch.

          Michael McCandless: It would be good to compare performance as a first review. We can then merge this with OutputStreamDataOutput. An alternative would be to nuke BufferedIndexOutput completely and use BufferedOutputStream in combinations with java.util.zip.CheckedOutputStream (for the checksum)!

          Show
          Uwe Schindler added a comment - - edited Very simple patch. Michael McCandless : It would be good to compare performance as a first review. We can then merge this with OutputStreamDataOutput. An alternative would be to nuke BufferedIndexOutput completely and use BufferedOutputStream in combinations with java.util.zip.CheckedOutputStream (for the checksum)!
          Hide
          Michael McCandless added a comment -

          I tested index time for full Wikipedia; it's output intensive, and it looks like no perf change w/ the patch, though the numbers are a little noisy from run to run ...

          Show
          Michael McCandless added a comment - I tested index time for full Wikipedia; it's output intensive, and it looks like no perf change w/ the patch, though the numbers are a little noisy from run to run ...
          Hide
          Uwe Schindler added a comment -

          Hi,

          I cleaned up most of the code. This now makes BufferedIndexOutput obsolete (once I fixed RateLimiter, which buffers a second time!!! horrible!!!).

          But before I do this, we should check the perf, because this is now completely different code.

          I also fixed HdfsDirectory to use this new class, too.

          The only remaining use of BufferedIndexOutput is in RateLimiter. I think we should plug the rate limiter deeper on the OutputStream level in future (subclass BufferedOutputStream to limit rate) and allow to plug that into the FSDir impl.

          Show
          Uwe Schindler added a comment - Hi, I cleaned up most of the code. This now makes BufferedIndexOutput obsolete (once I fixed RateLimiter, which buffers a second time!!! horrible!!!). But before I do this, we should check the perf, because this is now completely different code. I also fixed HdfsDirectory to use this new class, too. The only remaining use of BufferedIndexOutput is in RateLimiter. I think we should plug the rate limiter deeper on the OutputStream level in future (subclass BufferedOutputStream to limit rate) and allow to plug that into the FSDir impl.
          Hide
          Michael McCandless added a comment -

          Indexing perf of new patch looks fine too!

          Show
          Michael McCandless added a comment - Indexing perf of new patch looks fine too!
          Hide
          Uwe Schindler added a comment -

          New patch to make sure BufferedOutputStream is flushed on close(), not ignoring Exceptions.

          Show
          Uwe Schindler added a comment - New patch to make sure BufferedOutputStream is flushed on close(), not ignoring Exceptions.
          Hide
          Uwe Schindler added a comment -

          New patch, now fixes RateLimiter and nukes BufferedIndexOutput.

          The Ratelimiter was quite easy to fix. I only changed the single-byte write to not always call the volatile read on the getMinPauseCheckBytes() volatile. By this small change we no longer need to double buffer using BufferedIndexOutput. I think this should be fine now.

          Show
          Uwe Schindler added a comment - New patch, now fixes RateLimiter and nukes BufferedIndexOutput. The Ratelimiter was quite easy to fix. I only changed the single-byte write to not always call the volatile read on the getMinPauseCheckBytes() volatile. By this small change we no longer need to double buffer using BufferedIndexOutput. I think this should be fine now.
          Hide
          Uwe Schindler added a comment - - edited

          New patch with more cleanups:

          • Improved and simplified RateLimiterIndexOutput (Michael McCandless: can yu have a look. I think this is better now)
          • Added the chunking again to FSIndexOutput (see this comment on LUCENE-5164).

          I think this is ready

          Show
          Uwe Schindler added a comment - - edited New patch with more cleanups: Improved and simplified RateLimiterIndexOutput ( Michael McCandless : can yu have a look. I think this is better now) Added the chunking again to FSIndexOutput (see this comment on LUCENE-5164 ). I think this is ready
          Hide
          Uwe Schindler added a comment -

          Small simplicfications in OutputStreamIndexOutput, cooler close()

          Show
          Uwe Schindler added a comment - Small simplicfications in OutputStreamIndexOutput, cooler close()
          Hide
          ASF subversion and git services added a comment -

          Commit 1596057 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1596057 ]

          LUCENE-5678: Use FileOutputStream instead of RandomAccessFile to write index data. BufferedIndexOutput and related APIs were removed.

          Show
          ASF subversion and git services added a comment - Commit 1596057 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1596057 ] LUCENE-5678 : Use FileOutputStream instead of RandomAccessFile to write index data. BufferedIndexOutput and related APIs were removed.
          Hide
          Uwe Schindler added a comment -

          Committed to trunk.

          We might backport this to 4.x, once we removed deprecated seek() in IndexOutput. We can discuss this later, once this simplified/standardized code has proven to be as fast and stable.

          Show
          Uwe Schindler added a comment - Committed to trunk. We might backport this to 4.x, once we removed deprecated seek() in IndexOutput. We can discuss this later, once this simplified/standardized code has proven to be as fast and stable.
          Hide
          Uwe Schindler added a comment -

          Reopen for backport after LUCENE-5727.

          Show
          Uwe Schindler added a comment - Reopen for backport after LUCENE-5727 .
          Hide
          Uwe Schindler added a comment -

          For this to be backported, I also need to backport LUCENE-5582.

          Show
          Uwe Schindler added a comment - For this to be backported, I also need to backport LUCENE-5582 .
          Hide
          Uwe Schindler added a comment -

          Patch for 4.x.

          I had to remove the already deprecated setLength() on IndexOutput, too.

          Show
          Uwe Schindler added a comment - Patch for 4.x. I had to remove the already deprecated setLength() on IndexOutput, too.
          Hide
          ASF subversion and git services added a comment -

          Commit 1599548 from Uwe Schindler in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1599548 ]

          Merged revision(s) 1596057 from lucene/dev/trunk:
          LUCENE-5678: Use FileOutputStream instead of RandomAccessFile to write index data. BufferedIndexOutput and related APIs were removed.

          Show
          ASF subversion and git services added a comment - Commit 1599548 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1599548 ] Merged revision(s) 1596057 from lucene/dev/trunk: LUCENE-5678 : Use FileOutputStream instead of RandomAccessFile to write index data. BufferedIndexOutput and related APIs were removed.
          Hide
          ASF subversion and git services added a comment -

          Commit 1599550 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1599550 ]

          LUCENE-5678: Move changes.

          Show
          ASF subversion and git services added a comment - Commit 1599550 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1599550 ] LUCENE-5678 : Move changes.
          Hide
          ASF subversion and git services added a comment -

          Commit 1599568 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1599568 ]

          LUCENE-5678: Remove more seeking stuff

          Show
          ASF subversion and git services added a comment - Commit 1599568 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1599568 ] LUCENE-5678 : Remove more seeking stuff
          Hide
          ASF subversion and git services added a comment -

          Commit 1599573 from Uwe Schindler in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1599573 ]

          Merged revision(s) 1599550-1599568 from lucene/dev/trunk:
          LUCENE-5678: Remove more seeking stuff

          Show
          ASF subversion and git services added a comment - Commit 1599573 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1599573 ] Merged revision(s) 1599550-1599568 from lucene/dev/trunk: LUCENE-5678 : Remove more seeking stuff

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development