Lucene - Core
  1. Lucene - Core
  2. LUCENE-2787

disable atime for DirectIOLinuxDirectory

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      In Linux's open():
      O_NOATIME
      (Since Linux 2.6.8) Do not update the file last access time (st_atime in the inode) when the file is read(2). This flag is intended for use by indexing or backup programs, where its use can significantly reduce the amount of disk activity. This flag may not be effective on all filesystems. One example is NFS, where the server maintains the access time.

      So we should do this in our linux-specific DirectIOLinuxDirectory.

      Separately (offtopic), it would be better if this was a LinuxDirectory that only uses O_DIRECT when it should
      It would be nice to think about an optional modules/native for common platforms similar to what tomcat provides
      Its easier to test directories like this now (-Dtests.directory)...

      1. LUCENE-2787.patch
        0.7 kB
        Robert Muir

        Activity

        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1
        Hide
        Robert Muir added a comment -

        Committed revision 1041954 (trunk), 1041957 (3x)

        Show
        Robert Muir added a comment - Committed revision 1041954 (trunk), 1041957 (3x)
        Hide
        Simon Willnauer added a comment -

        after all i think we should really do it. I can not think of any situation where you want atime to be updated here. It seems that lots of distributions use relatime which is smarter about it see: http://lwn.net/Articles/244829/

        we should really document that on the wiki so that folks can check what their dist does or by default set it to noatime.

        simon

        Show
        Simon Willnauer added a comment - after all i think we should really do it. I can not think of any situation where you want atime to be updated here. It seems that lots of distributions use relatime which is smarter about it see: http://lwn.net/Articles/244829/ we should really document that on the wiki so that folks can check what their dist does or by default set it to noatime. simon
        Hide
        Michael McCandless added a comment -

        +1, this is a no brainer. I had no idea linux lets you turn off atime per file desriptor!

        It's ridiculous that the OS maintains an atime on our index files.

        Uwe, I agree about the intention of the man page (so eg back when contrib/benchmark used to write 10,000 files to run its tests, and then index them, we could've seen a big speedup ), but still it can't hurt to also turn it off when opening the index files for reading.

        I think atime is updated per-read not just at open (http://lkml.org/lkml/1998/12/14/81) though I'm not sure. Even so, it's presumably cached in the OS's write buffer and then only flushed periodically, so I don't think we'll see sizable gains here. But every bit counts so I think we should do it.

        Show
        Michael McCandless added a comment - +1, this is a no brainer. I had no idea linux lets you turn off atime per file desriptor! It's ridiculous that the OS maintains an atime on our index files. Uwe, I agree about the intention of the man page (so eg back when contrib/benchmark used to write 10,000 files to run its tests, and then index them, we could've seen a big speedup ), but still it can't hurt to also turn it off when opening the index files for reading. I think atime is updated per-read not just at open ( http://lkml.org/lkml/1998/12/14/81 ) though I'm not sure. Even so, it's presumably cached in the OS's write buffer and then only flushed periodically, so I don't think we'll see sizable gains here. But every bit counts so I think we should do it.
        Hide
        Robert Muir added a comment -

        Uwe: I don't interpret it that way!

        I don't think our indexinputs should be doing writes!

        Show
        Robert Muir added a comment - Uwe: I don't interpret it that way! I don't think our indexinputs should be doing writes!
        Hide
        Uwe Schindler added a comment -

        The option exists specifically for apps like lucene... see the description from the man page!!!!

        The intention behind the man page is not for the part of the app that manages the index itsself (like Lucene) - it is for the part of the app, that reads files to index them (so that would be the app that uses lucene and e.g. uses TIKA to read all files, this one should set noatime). The idea is to not mark the file as "accessed" when the virus scanner or the KDE/gnome file system browser indexes it.

        Simon is right about setting it as a mount option.

        Show
        Uwe Schindler added a comment - The option exists specifically for apps like lucene... see the description from the man page!!!! The intention behind the man page is not for the part of the app that manages the index itsself (like Lucene) - it is for the part of the app, that reads files to index them (so that would be the app that uses lucene and e.g. uses TIKA to read all files, this one should set noatime). The idea is to not mark the file as "accessed" when the virus scanner or the KDE/gnome file system browser indexes it. Simon is right about setting it as a mount option.
        Hide
        Robert Muir added a comment -

        Also simon, i just wanted to say, you need to be root to change the mount option etc.

        I think this is totally appropriate for us to do, again quoting from the page:

        "This flag is intended for use by indexing or backup programs, where its use can significantly reduce the amount of disk activity."

        Show
        Robert Muir added a comment - Also simon, i just wanted to say, you need to be root to change the mount option etc. I think this is totally appropriate for us to do, again quoting from the page: "This flag is intended for use by indexing or backup programs, where its use can significantly reduce the amount of disk activity."
        Hide
        Robert Muir added a comment -

        Simon, of course you can, but why not set it? Our indexes don't need the atime for any reason.

        The option exists specifically for apps like lucene... see the description from the man page!!!!

        Show
        Robert Muir added a comment - Simon, of course you can, but why not set it? Our indexes don't need the atime for any reason. The option exists specifically for apps like lucene... see the description from the man page!!!!
        Hide
        Simon Willnauer added a comment -

        robert, you can also control this through mount options / how you mount your filesystems with setting the noatime option on the mount command do you think this is absolutely necessary to set this in here by default?

        simon

        Show
        Simon Willnauer added a comment - robert, you can also control this through mount options / how you mount your filesystems with setting the noatime option on the mount command do you think this is absolutely necessary to set this in here by default? simon
        Hide
        Robert Muir added a comment -

        all core tests pass with this directory.

        Show
        Robert Muir added a comment - all core tests pass with this directory.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development