Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9633

[C++] Do not toggle memory mapping globally in LocalFileSystem

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      In the context of the Datasets API, some file formats benefit greatly from memory mapping (like Arrow IPC files) while other less so. Additionally, in some scenarios, memory mapping could fail when used on network-attached storage devices. Since a filesystem may be used to read different kinds of files and use both memory mapping and non-memory mapping, and additionally the Datasets API should be able to fall back on non-memory mapping if the attempt to memory map fails, it would make sense to have a non-global option for this:

      https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/localfs.h

      I would suggest adding a new filesystem API with something like OpenMappedInputFile with some options to control the behavior when memory mapping is not possible. These options may be among:

      • Falling back on a normal RandomAccessFile
      • Reading the entire file into memory (or even tmpfs?) and then wrapping it in a BufferReader
      • Failing

      Attachments

        Activity

          People

            Unassigned Unassigned
            wesm Wes McKinney
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: