Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-414

Evaluate mmap-based writes for Log implementation

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None

    Description

      Working on another project I noticed that small write performance for FileChannel is really very bad. This likely effects Kafka in the case where messages are produced one at a time or in small batches. I wrote a quick program to evaluate the following options:
      raf = RandomAccessFile
      mmap = MappedByteBuffer
      channel = FileChannel
      For both of the later two I tried both direct-allocated and non-direct allocated buffers (direct allocation is supposed to be faster).

      Here are the results I saw:

      [jkreps@jkreps-ld valencia]$ java -XX:+UseConcMarkSweepGC -cp target/test-classes -server -Xmx1G -Xms1G valencia.TestLinearWritePerformance $((256*1024)) $((1*1024*1024*1024)) 2
      file_length size (bytes) raf (mb/sec) channel_direct (mb/sec) mmap_direct (mb/sec) channel_heap (mb/sec) mmap_heap (mb/sec)
      1000000 1 0.60 0.52 28.66 0.55 50.40
      2000000 2 1.18 1.16 67.84 1.13 74.17
      4000000 4 2.33 2.26 121.52 2.23 122.14
      8000000 8 4.72 4.51 228.39 4.41 175.20
      16000000 16 9.25 8.96 393.24 8.88 314.11
      32000000 32 18.43 17.93 601.83 17.28 482.25
      64000000 64 36.25 35.21 799.98 34.39 680.39
      128000000 128 69.80 67.52 963.30 66.21 870.82
      256000000 256 134.24 129.25 1064.13 129.01 1014.00
      512000000 512 247.38 238.24 1124.71 235.57 1091.81
      1024000000 1024 420.42 411.43 1170.94 406.57 1138.80
      1073741824 2048 671.93 658.96 1133.63 650.39 1151.81
      1073741824 4096 1007.84 989.88 1165.60 976.10 1158.49
      1073741824 8192 1137.12 1145.01 1189.38 1128.30 1174.66
      1073741824 16384 1172.63 1228.33 1192.19 1206.58 1156.37
      1073741824 32768 1221.13 1295.37 1170.96 1262.28 1156.65
      1073741824 65536 1255.23 1306.33 1160.22 1268.24 1142.52
      1073741824 131072 1240.65 1292.06 1101.90 1269.00 1119.14

      The size column gives the size of the write, and the length column gives the total length of the file written.

      Now over a period of time the 1GB/sec performance is unsustainable because the disk on my machine would not be able to keep up. Nonetheless it is worth noting that even up to 256 byte writes that is not the bottleneck, the bottleneck is the write overhead.

      This would indicate that a better strategy for the log would be to pre-allocate the segment and mmap it. Then use the memory map for writes and continue to use the filechannel for reads.

      Attachments

        1. TestLinearWritePerformance.java
          4 kB
          Jay Kreps
        2. linear_write_performance.txt
          4 kB
          Jay Kreps

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jkreps Jay Kreps
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: