Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-5884

Change default pageSize of page memory to 4KB

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.3
    • persistence

    Description

      Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven enviroments with SSD disk. There are several reasons for this:
      1) Page size of linux page cache is 4k by default on most kernels (you can check yours by "getconf PAGE_SIZE" command). With 2k random writes vm.dirty_ratio threshold is reached two times faster than with 4k random writes.
      2) Most SSD manufacturers don't expose actual disk page size, but they recommend to write at least 4k at once. Also, 4k blocks are used during benchmarking SSD random writes.
      Related question: https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading
      Article by Emmanuel Goossaert describing why writing less than a page is сounterproductive: http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/
      I've prepared a checkpoint emulation benchmark (code and results attached). Run on production-level hardware (CentOS, 100 GB RAM, total LFS size is 100GB, vm.dirty_ratio=10) showed that checkpointing with 4k pages is much more efficient than with 2k.
      Important: backwards compatibility must be ensured with LFS files created with old 2k default page size.

      Attachments

        1. CpBenchmark.java
          52 kB
          Ivan Rakov
        2. iostat.log
          6.62 MB
          Ivan Rakov
        3. ssdlab.log
          11 kB
          Ivan Rakov

        Activity

          People

            ivan.glukos Ivan Rakov
            ivan.glukos Ivan Rakov
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m