Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10019

Align file starts in CFS files to have proper alignment (8 bytes)

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 9.0
    • 9.0
    • core/codecs, core/store
    • None
    • New

    Description

      While discussing about MMapDirectory and fast access to file contents through MMap (https://github.com/apache/lucene/pull/177 and previous versions of this draft, also), I figured out that for most Lucene files, the data inside is not aligned at all.

      We can't fix this easily and it's also not always important, but some files should really have a CPU fieldly alignment from beginning! This is escpecially important when we use slices().

      I got many tests with aligned VarHandles to pass, but it broke instantly, if the file was inside a Compound CFS file.

      CompoundFormat.write() just appends all data to the IndexOutput and writes the offset to the entries file. The fix to make at least file starts aligned is to just write some null-bytes between the files, so startOffset is aligned to multiples of 8 bytes.

      At a later stage we could also think of aligning to LBA blocks/sectors/whatever to make OS paging work better. But for performance of index access, slices of compound files when memory mapped should at least align to 8 bytes.

      Fix is easy: Just add some modulo on startOffset and write some extra bytes before the next file is serialized. The change is only 2 lines. It does not even change index format!

      I'd like to get this in for 9.0 so we can at least say: our CFS files are aligned. Aligning other files like docvalues to better help CPU is then possible.

      I will provide a simple pull request for Lucene90CompoundFormat soon. If you don't see any problems, this is a no-brainer.

      Attachments

        Issue Links

          Activity

            People

              uschindler Uwe Schindler
              uschindler Uwe Schindler
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 40m
                  2h 40m