Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1121

Use nio.transferTo when copying large blocks of bytes

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When building a CFS file, and also when merging stored fields (and
      term vectors, with LUCENE-1120), we copy large blocks of bytes at
      once.

      We currently do this with an intermediate buffer.

      But, nio.transferTo should be somewhat faster on OS's that offer low
      level IO APIs for moving blocks of bytes between files.

      1. LUCENE-1121.patch
        8 kB
        Michael McCandless
      2. LUCENE-1121.patch
        5 kB
        Michael McCandless
      3. testIO.java
        3 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Attached patch. All tests pass.

        We shouldn't push this into 2.3.

        I still need to test across more platforms and see what performance
        impact is.

        Show
        mikemccand Michael McCandless added a comment - Attached patch. All tests pass. We shouldn't push this into 2.3. I still need to test across more platforms and see what performance impact is.
        Hide
        mikemccand Michael McCandless added a comment -

        Attached patch. All tests pass ... but, I don't think we should
        commit this.

        I ran performance tests across several platforms. All times are best
        of 3 runs, indexing first 200K docs of Wikipedia. I used
        SerialMergeScheduler for these tests so I could more easily measure
        the impact on merging as well:

        Linux (2.6.22), ReiserFS on RAID5 array: 528.1 sec vs 537.0 sec ( 1.7% faster)
        Mac OS X 10.4 on RAID0 array: 402.6 sec vs 405.0 sec ( 0.6% faster)
        Windows Server 2003 R64 on RAID0 array: 472.3 sec vs 752.6 sec (59.3% SLOWER)

        I was rather stunned by the result on Windows Server 2003; I ran that
        test twice to be sure. It's really true. My only guess is write
        caching (which is turned on for this drive) is somehow not used when
        using transferTo.

        So then I made a standalone test that creates a big file (you specify
        the size X 10 MB), and then copies that big file using transferTo and
        then using an intermediate 64 KB buffer. Results below:

        OS X 10.4 on external firewire drive: FASTER
        create 500 MB file... 31689 msec
        transferTo... 31947 msec
        create 500 MB file... 31412 msec
        buffer... 31215 msec
        SLOWER 2.345%

        OS X 10.4 on 4-drive RAID 0 array
        create 500 MB file... 2409 msec
        transferTo... 2449 msec
        create 500 MB file... 2366 msec
        buffer... 2649 msec
        FASTER 7.55%

        Linux 2.6.22 on single SATA drive, ext3
        create 500 MB file... 12841 msec
        transferTo... 12438 msec
        create 500 MB file... 11219 msec
        buffer... 12003 msec
        SLOWER 3.624%

        Linux 2.6.22 on 6-drive RAID 5 array, ext3
        create 500 MB file... 9647 msec
        transferTo... 9107 msec
        create 500 MB file... 9092 msec
        buffer... 10089 msec
        FASTER 9.733%

        Windows Server 2003 64 R2, single NTFS internal SATA drive
        create 500 MB file... 32485 msec
        transferTo... 38922 msec
        create 500 MB file... 33484 msec
        buffer... 1375 msec
        SLOWER 2,730.691%

        Windows XP Pro SP2, laptop hard drive
        create 200 MB file... 20159 msec
        transferTo... 17515 msec
        create 200 MB file... 24265 msec
        buffer... 18397 msec
        FASTER 4.794%

        Bottom line is: FileChannel.transferTo is not always a win, and, can
        be a catastrophic loss. I think we should stick with tried & true,
        simple, buffer copying, at least for now...

        Show
        mikemccand Michael McCandless added a comment - Attached patch. All tests pass ... but, I don't think we should commit this. I ran performance tests across several platforms. All times are best of 3 runs, indexing first 200K docs of Wikipedia. I used SerialMergeScheduler for these tests so I could more easily measure the impact on merging as well: Linux (2.6.22), ReiserFS on RAID5 array: 528.1 sec vs 537.0 sec ( 1.7% faster) Mac OS X 10.4 on RAID0 array: 402.6 sec vs 405.0 sec ( 0.6% faster) Windows Server 2003 R64 on RAID0 array: 472.3 sec vs 752.6 sec (59.3% SLOWER) I was rather stunned by the result on Windows Server 2003; I ran that test twice to be sure. It's really true. My only guess is write caching (which is turned on for this drive) is somehow not used when using transferTo. So then I made a standalone test that creates a big file (you specify the size X 10 MB), and then copies that big file using transferTo and then using an intermediate 64 KB buffer. Results below: OS X 10.4 on external firewire drive: FASTER create 500 MB file... 31689 msec transferTo... 31947 msec create 500 MB file... 31412 msec buffer... 31215 msec SLOWER 2.345% OS X 10.4 on 4-drive RAID 0 array create 500 MB file... 2409 msec transferTo... 2449 msec create 500 MB file... 2366 msec buffer... 2649 msec FASTER 7.55% Linux 2.6.22 on single SATA drive, ext3 create 500 MB file... 12841 msec transferTo... 12438 msec create 500 MB file... 11219 msec buffer... 12003 msec SLOWER 3.624% Linux 2.6.22 on 6-drive RAID 5 array, ext3 create 500 MB file... 9647 msec transferTo... 9107 msec create 500 MB file... 9092 msec buffer... 10089 msec FASTER 9.733% Windows Server 2003 64 R2, single NTFS internal SATA drive create 500 MB file... 32485 msec transferTo... 38922 msec create 500 MB file... 33484 msec buffer... 1375 msec SLOWER 2,730.691% Windows XP Pro SP2, laptop hard drive create 200 MB file... 20159 msec transferTo... 17515 msec create 200 MB file... 24265 msec buffer... 18397 msec FASTER 4.794% Bottom line is: FileChannel.transferTo is not always a win, and, can be a catastrophic loss. I think we should stick with tried & true, simple, buffer copying, at least for now...
        Hide
        mikemccand Michael McCandless added a comment -

        Attaching standalone test (testIO.java). Just run it like this:

        java testIO 50

        and it will create a 500 MB file and test copying it w/ transferTo vs
        intermediate buffer.

        Show
        mikemccand Michael McCandless added a comment - Attaching standalone test (testIO.java). Just run it like this: java testIO 50 and it will create a 500 MB file and test copying it w/ transferTo vs intermediate buffer.
        Hide
        cutting Doug Cutting added a comment -

        What JVM were these tests run with?

        Show
        cutting Doug Cutting added a comment - What JVM were these tests run with?
        Hide
        mikemccand Michael McCandless added a comment -

        Sun's JVM, 1.4 (on the Windows XP Pro SP2 laptop), Sun's JVM 1.6 (on the Windows Server 2003 64 R2 machine), Sun's JVM 1.5 on Linux and Apple's release of Sun's JVM for the two OS X runs.

        Show
        mikemccand Michael McCandless added a comment - Sun's JVM, 1.4 (on the Windows XP Pro SP2 laptop), Sun's JVM 1.6 (on the Windows Server 2003 64 R2 machine), Sun's JVM 1.5 on Linux and Apple's release of Sun's JVM for the two OS X runs.
        Hide
        cutting Doug Cutting added a comment -

        For Hadoop, we've seen significant performance improvements on Linux in Sun's 1.6 over 1.5. Clearly, 1.6 didn't help on Windows Server 2003, but it would be good to know if there are any cases where it makes a huge improvement. If there are, then it could be a useful option.

        Show
        cutting Doug Cutting added a comment - For Hadoop, we've seen significant performance improvements on Linux in Sun's 1.6 over 1.5. Clearly, 1.6 didn't help on Windows Server 2003, but it would be good to know if there are any cases where it makes a huge improvement. If there are, then it could be a useful option.
        Hide
        mikemccand Michael McCandless added a comment -

        That's interesting ... I'll test Sun's JVM 1.6 on Linux.

        Maybe we should commit this, but, leave the default copy method using intermediate buffer? The patch adds set/getCopyMethod to FSDirectory.

        Show
        mikemccand Michael McCandless added a comment - That's interesting ... I'll test Sun's JVM 1.6 on Linux. Maybe we should commit this, but, leave the default copy method using intermediate buffer? The patch adds set/getCopyMethod to FSDirectory.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Here are some more results from a windows xp and server 2003 machine. This is with java 1.6.

        Windows 2003 R2 - internal sata drive

        D:\>java Test 50
        create 500 MB file... 28625 msec
        transferTo... 17656 msec
        create 500 MB file... 13390 msec
        buffer... 8391 msec
        SLOWER 110.416%

        D:\>java Test 50
        create 500 MB file... 14141 msec
        transferTo... 8765 msec
        create 500 MB file... 13531 msec
        buffer... 1531 msec
        SLOWER 472.502%

        D:\>java Test 50
        create 500 MB file... 13578 msec
        transferTo... 9282 msec
        create 500 MB file... 13391 msec
        buffer... 1235 msec
        SLOWER 651.579%

        Windows XP SP2 - laptop drive

        D:\>java Test 50
        create 500 MB file... 18737 msec
        transferTo... 28239 msec
        create 500 MB file... 19113 msec
        buffer... 65839 msec
        FASTER 57.109%

        D:\>java Test 50
        create 500 MB file... 21785 msec
        transferTo... 24801 msec
        create 500 MB file... 17940 msec
        buffer... 33615 msec
        FASTER 26.22%

        D:\>java Test 50
        create 500 MB file... 22520 msec
        transferTo... 24300 msec
        create 500 MB file... 19644 msec
        buffer... 34349 msec
        FASTER 29.256%

        Show
        markrmiller@gmail.com Mark Miller added a comment - Here are some more results from a windows xp and server 2003 machine. This is with java 1.6. Windows 2003 R2 - internal sata drive D:\>java Test 50 create 500 MB file... 28625 msec transferTo... 17656 msec create 500 MB file... 13390 msec buffer... 8391 msec SLOWER 110.416% D:\>java Test 50 create 500 MB file... 14141 msec transferTo... 8765 msec create 500 MB file... 13531 msec buffer... 1531 msec SLOWER 472.502% D:\>java Test 50 create 500 MB file... 13578 msec transferTo... 9282 msec create 500 MB file... 13391 msec buffer... 1235 msec SLOWER 651.579% Windows XP SP2 - laptop drive D:\>java Test 50 create 500 MB file... 18737 msec transferTo... 28239 msec create 500 MB file... 19113 msec buffer... 65839 msec FASTER 57.109% D:\>java Test 50 create 500 MB file... 21785 msec transferTo... 24801 msec create 500 MB file... 17940 msec buffer... 33615 msec FASTER 26.22% D:\>java Test 50 create 500 MB file... 22520 msec transferTo... 24300 msec create 500 MB file... 19644 msec buffer... 34349 msec FASTER 29.256%
        Hide
        mikemccand Michael McCandless added a comment -

        OK I ran Sun JDK 1.6.0_04 on Linux:

        Linux 2.6.22, single SATA drive, ext3:
        create 500 MB file... 13088 msec
        transferTo... 12796 msec
        create 500 MB file... 10727 msec
        buffer... 12291 msec
        SLOWER 4.109%

        Linux 2.6.22, on 6-drive RAID 5 array, reiserfs:
        create 500 MB file... 11135 msec
        transferTo... 11068 msec
        create 500 MB file... 8599 msec
        buffer... 10708 msec
        SLOWER 3.362%

        Show
        mikemccand Michael McCandless added a comment - OK I ran Sun JDK 1.6.0_04 on Linux: Linux 2.6.22, single SATA drive, ext3: create 500 MB file... 13088 msec transferTo... 12796 msec create 500 MB file... 10727 msec buffer... 12291 msec SLOWER 4.109% Linux 2.6.22, on 6-drive RAID 5 array, reiserfs: create 500 MB file... 11135 msec transferTo... 11068 msec create 500 MB file... 8599 msec buffer... 10708 msec SLOWER 3.362%
        Hide
        rangadi Raghu Angadi added a comment -

        Only savings I would expect from transferTo() would be CPU reduction. Does the benchmark above measure wall clock time or "cpu time"? Btw, the windows results are pretty... strange.

        HADOOP-3164 shows expected CPU benefit. Still need to do more extensive tests where I max out CPU with and without patch and compare the wall clock time. Initial test just compares cpu reported on /proc/pid/stat with a test that is disk bound.

        Show
        rangadi Raghu Angadi added a comment - Only savings I would expect from transferTo() would be CPU reduction. Does the benchmark above measure wall clock time or "cpu time"? Btw, the windows results are pretty... strange. HADOOP-3164 shows expected CPU benefit. Still need to do more extensive tests where I max out CPU with and without patch and compare the wall clock time. Initial test just compares cpu reported on /proc/pid/stat with a test that is disk bound.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Isn't this still a nice little optimization for compound copies? When not using Win server, its faster in general, and even when similar, you get the less CPU usage optimization.

        At worst it seems we should enable for that case when detecting non windows? We could even throw in a couple specific Windows versions we know work well - the XP results I got were fantastic, and the ones Mike got were not bad. Prob not necessary, as most deployments will prob be on server, but future versions might be better.

        Seems like a little win on 'nix systems anyway, just from the CPU savings.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Isn't this still a nice little optimization for compound copies? When not using Win server, its faster in general, and even when similar, you get the less CPU usage optimization. At worst it seems we should enable for that case when detecting non windows? We could even throw in a couple specific Windows versions we know work well - the XP results I got were fantastic, and the ones Mike got were not bad. Prob not necessary, as most deployments will prob be on server, but future versions might be better. Seems like a little win on 'nix systems anyway, just from the CPU savings.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        NM - it appears that when you chunk, you lose the CPU win - and when you don't chunk, you get the win, but it performs nasty after other java io operations. Bummer.

        Show
        markrmiller@gmail.com Mark Miller added a comment - NM - it appears that when you chunk, you lose the CPU win - and when you don't chunk, you get the win, but it performs nasty after other java io operations. Bummer.

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development