Lucene - Core
  1. Lucene - Core
  2. LUCENE-6614

IOUtils.spins doesn't work for NVMe drives

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      NVMe is the faster (than AHCI) protocol for newer SSDs that plug into the PCIE bus.

      I just built a new beast box with one of these drives, and the partition is named /dev/nvme0n1p1 while the device is /dev/nvme0n1 by Linux - this also appears in /sys/block with rotational=0.

      I think Steve Rowe also has an NVME drive ...

      Uwe Schindler (who got the box working for me: thank you!!!) has ideas on how to fix it!

      1. LUCENE-6614.patch
        7 kB
        Uwe Schindler
      2. LUCENE-6614.patch
        4 kB
        Uwe Schindler
      3. LUCENE-6614.patch
        4 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Dawid Weiss added a comment -

          On a related note, this method also has a problem with any Directory that is wrapped with a delegate (and eventually has an FSDirectory pointing to disk). All such Directories indicate spin=true.

          I don't have any ready solution for this. There are a couple of ways to correct it – move usesSpinningDrive() to be a method on the Directory, for example. Or more generic hasLargeSeekLatency...?

          Show
          Dawid Weiss added a comment - On a related note, this method also has a problem with any Directory that is wrapped with a delegate (and eventually has an FSDirectory pointing to disk). All such Directories indicate spin=true . I don't have any ready solution for this. There are a couple of ways to correct it – move usesSpinningDrive() to be a method on the Directory, for example. Or more generic hasLargeSeekLatency ...?
          Hide
          Uwe Schindler added a comment - - edited

          Here is a quick patch, completely untested (no Linux available at the moment). The idea:

          Instead of stripping off digits from the partition name and do exists() over and over, I think the better solution is to take what's already there and not do brute force. This patch lists the directory contents of /sys/block and for each file name it encounters, it checks if the partition we got from mountpoint starts with the /sys/block device name. Of course there can be overlap (loop1 vs loop10), so the largest match is taken.

          Maybe Mike has some time to test, I will do this later in my VirtualBOX. I also added a test for this type of device naming.

          Show
          Uwe Schindler added a comment - - edited Here is a quick patch, completely untested (no Linux available at the moment). The idea: Instead of stripping off digits from the partition name and do exists() over and over, I think the better solution is to take what's already there and not do brute force. This patch lists the directory contents of /sys/block and for each file name it encounters, it checks if the partition we got from mountpoint starts with the /sys/block device name. Of course there can be overlap (loop1 vs loop10), so the largest match is taken. Maybe Mike has some time to test, I will do this later in my VirtualBOX. I also added a test for this type of device naming.
          Hide
          Uwe Schindler added a comment -

          I don't have any ready solution for this. There are a couple of ways to correct it – move usesSpinningDrive() to be a method on the Directory, for example. Or more generic hasLargeSeekLatency...?

          +1 Yes!!! I like the name! The current solution is horrible. Directory should have a default method returning (true) slow, but any directory like FSdir can override and call IOUtils.spins().

          Show
          Uwe Schindler added a comment - I don't have any ready solution for this. There are a couple of ways to correct it – move usesSpinningDrive() to be a method on the Directory, for example. Or more generic hasLargeSeekLatency...? +1 Yes!!! I like the name! The current solution is horrible. Directory should have a default method returning (true) slow, but any directory like FSdir can override and call IOUtils.spins().
          Hide
          Dawid Weiss added a comment -

          The upside of moving it to Directory would be that any (arbitrarily complex) delegates would simply delegate further or provide composition (for example – any large seek subdirectory -> return large seek).

          Show
          Dawid Weiss added a comment - The upside of moving it to Directory would be that any (arbitrarily complex) delegates would simply delegate further or provide composition (for example – any large seek subdirectory -> return large seek).
          Hide
          Dawid Weiss added a comment - - edited

          Thinking about it I think the method should be explicitly named after what it looks for – we're looking for low-cost random access drives (a RAM directory or a directory based on tmpfs would also return true), so the method could be called simply Directory.supportsFastRandomAccess().

          Show
          Dawid Weiss added a comment - - edited Thinking about it I think the method should be explicitly named after what it looks for – we're looking for low-cost random access drives (a RAM directory or a directory based on tmpfs would also return true), so the method could be called simply Directory.supportsFastRandomAccess() .
          Hide
          Uwe Schindler added a comment -

          Dawid Weiss: Could you open a separate issue? I am +1 to this approach. I think Robert discussed about this in the original issue, too. But he wanted a simple and quick solution.

          By the way: spins() returns false on mounts of type tmpfs!

          Show
          Uwe Schindler added a comment - Dawid Weiss : Could you open a separate issue? I am +1 to this approach. I think Robert discussed about this in the original issue, too. But he wanted a simple and quick solution. By the way: spins() returns false on mounts of type tmpfs!
          Hide
          Dawid Weiss added a comment -

          > By the way: spins() returns false on mounts of type tmpfs!

          Yes, I know. But I think it should return true (if it's possible to detect a ram mount). Like I said – the point here is to detect drives that support fast random-access (nearly zero seek penalty) and there seem to be more options than just the ssd-vs.-rotational separation.

          Show
          Dawid Weiss added a comment - > By the way: spins() returns false on mounts of type tmpfs! Yes, I know. But I think it should return true (if it's possible to detect a ram mount). Like I said – the point here is to detect drives that support fast random-access (nearly zero seek penalty) and there seem to be more options than just the ssd-vs.-rotational separation.
          Hide
          Robert Muir added a comment -

          -1

          tmpfs is not a spinning disk.

          Show
          Robert Muir added a comment - -1 tmpfs is not a spinning disk.
          Hide
          Uwe Schindler added a comment -

          spins=false is a RAM mount because it detects tmpfs! spins=false => good!

          Show
          Uwe Schindler added a comment - spins=false is a RAM mount because it detects tmpfs! spins=false => good!
          Hide
          Uwe Schindler added a comment -

          OK, let's move this discussion to the new issue. I think this one here is to detect SSDs correctly also with the new device numbering scheme ovm NVMe devices.

          Show
          Uwe Schindler added a comment - OK, let's move this discussion to the new issue. I think this one here is to detect SSDs correctly also with the new device numbering scheme ovm NVMe devices.
          Hide
          Dawid Weiss added a comment -

          > tmpfs is not a spinning disk.

          So? Does it have large seek latency? I think the criterion should be: could it run multiple disk operations (like merges) at the same time without additional performance penalty. If it passes then it's just as good as an SSD, regardless of what's behind it.

          Show
          Dawid Weiss added a comment - > tmpfs is not a spinning disk. So? Does it have large seek latency? I think the criterion should be: could it run multiple disk operations (like merges) at the same time without additional performance penalty. If it passes then it's just as good as an SSD, regardless of what's behind it.
          Hide
          Robert Muir added a comment -

          I'm totally against returning anything but false for tmpfs. Sorry.

          We need to get this method out of lucene. Please propose something to the JDK or something like that.

          Show
          Robert Muir added a comment - I'm totally against returning anything but false for tmpfs. Sorry. We need to get this method out of lucene. Please propose something to the JDK or something like that.
          Hide
          Dawid Weiss added a comment -

          > We need to get this method out of lucene.

          This would be an ideal world. I mentioned it in one of my previous comments on another issue – the CMS should be agnostic to hardware and have the ability to maximize throughput for the current system conditions rather than rely on detection of spinning/ non-spinning drives, etc. Unfortunately I don't know how to implement this.

          In the mean time there is a problem that's unsolvable - if you have a custom Directory (or a DirectoryFilter) that eventually reaches to disk then the CMS will always come with defaults as if the disk was rotational. There are no solutions to this other than cloning the merging strategy.

          Show
          Dawid Weiss added a comment - > We need to get this method out of lucene. This would be an ideal world. I mentioned it in one of my previous comments on another issue – the CMS should be agnostic to hardware and have the ability to maximize throughput for the current system conditions rather than rely on detection of spinning/ non-spinning drives, etc. Unfortunately I don't know how to implement this. In the mean time there is a problem that's unsolvable - if you have a custom Directory (or a DirectoryFilter) that eventually reaches to disk then the CMS will always come with defaults as if the disk was rotational. There are no solutions to this other than cloning the merging strategy.
          Hide
          Robert Muir added a comment -

          Its not a problem really: the defaults totally work.

          If you have a custom Directory, you are a seriously advanced user, and as one, you can set CMS parameters yourself.

          So i see it that as a non-issue.

          Show
          Robert Muir added a comment - Its not a problem really: the defaults totally work. If you have a custom Directory, you are a seriously advanced user, and as one, you can set CMS parameters yourself. So i see it that as a non-issue.
          Hide
          Dawid Weiss added a comment -

          > you are a seriously advanced user

          The only reason I know it's working at sub-par performance level is because I am an advanced user... I know the source code.

          I disagree with your opinion, but I respect it since I understand those low-level methods to detect various things would proliferate. I still have this vision of the CMS just being smart enough to figure out how many concurrent merges it can run without blocking I/O. This really shouldn't be that difficult since it's essentially a maximization problem of f(count-of-merge-threads, max-throughput-per-thread) = total merge throughput. The problem is in how to pick a strategy that is robust enough to handle and recover from intermediate external system loads, disk/ system buffer spills, etc. An interesting research problem, really.

          Show
          Dawid Weiss added a comment - > you are a seriously advanced user The only reason I know it's working at sub-par performance level is because I am an advanced user... I know the source code. I disagree with your opinion, but I respect it since I understand those low-level methods to detect various things would proliferate. I still have this vision of the CMS just being smart enough to figure out how many concurrent merges it can run without blocking I/O. This really shouldn't be that difficult since it's essentially a maximization problem of f(count-of-merge-threads, max-throughput-per-thread) = total merge throughput . The problem is in how to pick a strategy that is robust enough to handle and recover from intermediate external system loads, disk/ system buffer spills, etc. An interesting research problem, really.
          Hide
          Robert Muir added a comment -

          Writing a custom directory is incredibly risky. Let's be honest, its really something that should be avoided at all costs, its almost asking for index corruption.

          Look at the "simple" ones lucene ships with, we have a hard enough time keeping those working correctly.

          So I really have absolutely zero sympathy for someone who writes a custom directory. They can set CMS parameters. And they always should if they care. the current logic does not work on windows or OS X for example.

          Its a statistical hack. it improves defaults because the vast majority of users are on linux, and the vast majority don't write custom directories. Lets just leave it as a hack and please not put it on Directory. methods on Directory are very expensive to maintain. They are incredibly hard to remove.

          Show
          Robert Muir added a comment - Writing a custom directory is incredibly risky. Let's be honest, its really something that should be avoided at all costs, its almost asking for index corruption. Look at the "simple" ones lucene ships with, we have a hard enough time keeping those working correctly. So I really have absolutely zero sympathy for someone who writes a custom directory. They can set CMS parameters. And they always should if they care. the current logic does not work on windows or OS X for example. Its a statistical hack. it improves defaults because the vast majority of users are on linux, and the vast majority don't write custom directories. Lets just leave it as a hack and please not put it on Directory. methods on Directory are very expensive to maintain. They are incredibly hard to remove.
          Hide
          Michael McCandless added a comment -

          To verify there really is a bug here (sorry I should have done this in the OP), I installed Elasticsearch 2.0 pre-release and started it and I see this in its startup logging output:

          [2015-06-26 09:17:54,024][INFO ][env                      ] [Windeagle] using [1] data paths, mounts [[/ (/dev/disk/by-uuid/d7b5a26e-bf10-4789-8443-9fd98798c554)]], net usable_space [364.3gb], net total_space [366.6gb], spins? [possibly], types [ext4]
          

          So indeed the bug is really (spins? should say "no", not "possibly"), but the plot thickens (maybe?) a bit, because of this /dev/disk/by-uuid/... mount point ... or maybe our existing heuristics deal with this dereference already ...

          Show
          Michael McCandless added a comment - To verify there really is a bug here (sorry I should have done this in the OP), I installed Elasticsearch 2.0 pre-release and started it and I see this in its startup logging output: [2015-06-26 09:17:54,024][INFO ][env ] [Windeagle] using [1] data paths, mounts [[/ (/dev/disk/by-uuid/d7b5a26e-bf10-4789-8443-9fd98798c554)]], net usable_space [364.3gb], net total_space [366.6gb], spins? [possibly], types [ext4] So indeed the bug is really (spins? should say "no", not "possibly"), but the plot thickens (maybe?) a bit, because of this /dev/disk/by-uuid/... mount point ... or maybe our existing heuristics deal with this dereference already ...
          Hide
          Robert Muir added a comment -

          The best way to improve this is to write a unit test for your drive. See TestIOUtils where you can mock up things like /sys and /dev.

          Show
          Robert Muir added a comment - The best way to improve this is to write a unit test for your drive. See TestIOUtils where you can mock up things like /sys and /dev.
          Hide
          Uwe Schindler added a comment -

          I think we are talking about different issues here. This is about the problem of detecting a NVMe SSD, which should work, but doesn't based on the way how NVMe's partitions are part of device name. We are not talking about Directory or if tmpfs spins. Please move this to another issue!

          This issue should fix the NVMe device naming and make the bruteforce stripping of partition numbers work better (instead it checks what is there and matches the /sys/block directory contents against the device name). So no bruteforcing and it works also with NVMe.

          The attached patch passes tests on my local Linux computer, so it should really work. Mike, did you check my patch and verify that it works?

          /dev/disk/by-uuid/

          That should work, because the IOUtils.spins() code dereferences symbolic links. So just try my patch, please.

          Show
          Uwe Schindler added a comment - I think we are talking about different issues here. This is about the problem of detecting a NVMe SSD, which should work, but doesn't based on the way how NVMe's partitions are part of device name. We are not talking about Directory or if tmpfs spins. Please move this to another issue! This issue should fix the NVMe device naming and make the bruteforce stripping of partition numbers work better (instead it checks what is there and matches the /sys/block directory contents against the device name). So no bruteforcing and it works also with NVMe. The attached patch passes tests on my local Linux computer, so it should really work. Mike, did you check my patch and verify that it works? /dev/disk/by-uuid/ That should work, because the IOUtils.spins() code dereferences symbolic links. So just try my patch, please.
          Hide
          Uwe Schindler added a comment -

          The best way to improve this is to write a unit test for your drive. See TestIOUtils where you can mock up things like /sys and /dev.

          See the attached patch. I added a unit test and fixed IOUtils.spins to work better with stripping partition numbers. It passes locally, I am just waiting for Mike to confirm. Maybe you also have a look at my patch and test, please!

          Show
          Uwe Schindler added a comment - The best way to improve this is to write a unit test for your drive. See TestIOUtils where you can mock up things like /sys and /dev. See the attached patch. I added a unit test and fixed IOUtils.spins to work better with stripping partition numbers. It passes locally, I am just waiting for Mike to confirm. Maybe you also have a look at my patch and test, please!
          Hide
          Robert Muir added a comment -

          +1, thank you Uwe. I like the test and also like the code cleanups.

          Show
          Robert Muir added a comment - +1, thank you Uwe. I like the test and also like the code cleanups.
          Hide
          Steve Rowe added a comment -

          I think Steve Rowe also has an NVME drive ...

          Yes, I do.

          In fact, Debian 8 failed to install properly because of the new NVMe device naming scheme, and I had to patch grub-installer and initramfs-tools to deal with it - see the bug reports I created: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785147 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785149.

          In the grub-installer script's device-to-disk method, which is a longish regex to strip off suffixes based on disk prefixes, there is a comment "# This should probably be rewritten using udevadm or similar."

          On my system, udevadm info /dev/nvme0n1p2 (the root mount point) says:

          P: /devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
          N: nvme0n1p2
          S: disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b
          S: disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da
          E: DEVLINKS=/dev/disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da
          E: DEVNAME=/dev/nvme0n1p2
          E: DEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
          E: DEVTYPE=partition
          E: ID_FS_TYPE=ext4
          E: ID_FS_USAGE=filesystem
          E: ID_FS_UUID=9954878e-5783-4f4b-bca3-55d689f221da
          E: ID_FS_UUID_ENC=9954878e-5783-4f4b-bca3-55d689f221da
          E: ID_FS_VERSION=1.0
          E: ID_PART_ENTRY_DISK=259:0
          E: ID_PART_ENTRY_NUMBER=2
          E: ID_PART_ENTRY_OFFSET=1050624
          E: ID_PART_ENTRY_SCHEME=gpt
          E: ID_PART_ENTRY_SIZE=646338560
          E: ID_PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4
          E: ID_PART_ENTRY_UUID=7c4cb587-74ba-471f-8a6d-8b5d94c3c86b
          E: ID_PART_TABLE_TYPE=gpt
          E: ID_PART_TABLE_UUID=c454b1a9-e191-4d93-a9af-db80b81485a8
          E: MAJOR=259
          E: MINOR=2
          E: SUBSYSTEM=block
          E: TAGS=:systemd:
          E: USEC_INITIALIZED=11060677
          

          Similarly, taking the /by-uuid/ path from E: DEVLINKS above, udevadm info /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d68 says:

          P: /devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
          N: nvme0n1p2
          S: disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b
          S: disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da
          E: DEVLINKS=/dev/disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da
          E: DEVNAME=/dev/nvme0n1p2
          E: DEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
          E: DEVTYPE=partition
          E: ID_FS_TYPE=ext4
          E: ID_FS_USAGE=filesystem
          E: ID_FS_UUID=9954878e-5783-4f4b-bca3-55d689f221da
          E: ID_FS_UUID_ENC=9954878e-5783-4f4b-bca3-55d689f221da
          E: ID_FS_VERSION=1.0
          E: ID_PART_ENTRY_DISK=259:0
          E: ID_PART_ENTRY_NUMBER=2
          E: ID_PART_ENTRY_OFFSET=1050624
          E: ID_PART_ENTRY_SCHEME=gpt
          E: ID_PART_ENTRY_SIZE=646338560
          E: ID_PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4
          E: ID_PART_ENTRY_UUID=7c4cb587-74ba-471f-8a6d-8b5d94c3c86b
          E: ID_PART_TABLE_TYPE=gpt
          E: ID_PART_TABLE_UUID=c454b1a9-e191-4d93-a9af-db80b81485a8
          E: MAJOR=259
          E: MINOR=2
          E: SUBSYSTEM=block
          E: TAGS=:systemd:
          E: USEC_INITIALIZED=11060677
          

          In both cases, E: DEVPATH has the correct block device name as the second-to-last path segment.

          Show
          Steve Rowe added a comment - I think Steve Rowe also has an NVME drive ... Yes, I do. In fact, Debian 8 failed to install properly because of the new NVMe device naming scheme, and I had to patch grub-installer and initramfs-tools to deal with it - see the bug reports I created: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785147 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785149 . In the grub-installer script's device-to-disk method, which is a longish regex to strip off suffixes based on disk prefixes, there is a comment "# This should probably be rewritten using udevadm or similar." On my system, udevadm info /dev/nvme0n1p2 (the root mount point) says: P: /devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 N: nvme0n1p2 S: disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b S: disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da E: DEVLINKS=/dev/disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da E: DEVNAME=/dev/nvme0n1p2 E: DEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 E: DEVTYPE=partition E: ID_FS_TYPE=ext4 E: ID_FS_USAGE=filesystem E: ID_FS_UUID=9954878e-5783-4f4b-bca3-55d689f221da E: ID_FS_UUID_ENC=9954878e-5783-4f4b-bca3-55d689f221da E: ID_FS_VERSION=1.0 E: ID_PART_ENTRY_DISK=259:0 E: ID_PART_ENTRY_NUMBER=2 E: ID_PART_ENTRY_OFFSET=1050624 E: ID_PART_ENTRY_SCHEME=gpt E: ID_PART_ENTRY_SIZE=646338560 E: ID_PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4 E: ID_PART_ENTRY_UUID=7c4cb587-74ba-471f-8a6d-8b5d94c3c86b E: ID_PART_TABLE_TYPE=gpt E: ID_PART_TABLE_UUID=c454b1a9-e191-4d93-a9af-db80b81485a8 E: MAJOR=259 E: MINOR=2 E: SUBSYSTEM=block E: TAGS=:systemd: E: USEC_INITIALIZED=11060677 Similarly, taking the /by-uuid/ path from E: DEVLINKS above, udevadm info /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d68 says: P: /devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 N: nvme0n1p2 S: disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b S: disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da E: DEVLINKS=/dev/disk/by-partuuid/7c4cb587-74ba-471f-8a6d-8b5d94c3c86b /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da E: DEVNAME=/dev/nvme0n1p2 E: DEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 E: DEVTYPE=partition E: ID_FS_TYPE=ext4 E: ID_FS_USAGE=filesystem E: ID_FS_UUID=9954878e-5783-4f4b-bca3-55d689f221da E: ID_FS_UUID_ENC=9954878e-5783-4f4b-bca3-55d689f221da E: ID_FS_VERSION=1.0 E: ID_PART_ENTRY_DISK=259:0 E: ID_PART_ENTRY_NUMBER=2 E: ID_PART_ENTRY_OFFSET=1050624 E: ID_PART_ENTRY_SCHEME=gpt E: ID_PART_ENTRY_SIZE=646338560 E: ID_PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4 E: ID_PART_ENTRY_UUID=7c4cb587-74ba-471f-8a6d-8b5d94c3c86b E: ID_PART_TABLE_TYPE=gpt E: ID_PART_TABLE_UUID=c454b1a9-e191-4d93-a9af-db80b81485a8 E: MAJOR=259 E: MINOR=2 E: SUBSYSTEM=block E: TAGS=:systemd: E: USEC_INITIALIZED=11060677 In both cases, E: DEVPATH has the correct block device name as the second-to-last path segment.
          Hide
          Steve Rowe added a comment -

          /dev/disk/by-uuid/

          That should work, because the IOUtils.spins() code dereferences symbolic links. So just try my patch, please.

          Looks to me like it will work:

          $ ls -l /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da     
          lrwxrwxrwx 1 root root 15 Jun 23 18:20 /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da -> ../../nvme0n1p2
          
          Show
          Steve Rowe added a comment - /dev/disk/by-uuid/ That should work, because the IOUtils.spins() code dereferences symbolic links. So just try my patch, please. Looks to me like it will work: $ ls -l /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da lrwxrwxrwx 1 root root 15 Jun 23 18:20 /dev/disk/by-uuid/9954878e-5783-4f4b-bca3-55d689f221da -> ../../nvme0n1p2
          Hide
          Uwe Schindler added a comment -

          Improved patch. This one adds some fake files to /sys/block to verify that it takes the longest patch for matching against the device name from mount point.

          To me this looks good, just waiting for comments & confirmation from Mike.

          Show
          Uwe Schindler added a comment - Improved patch. This one adds some fake files to /sys/block to verify that it takes the longest patch for matching against the device name from mount point. To me this looks good, just waiting for comments & confirmation from Mike.
          Hide
          Uwe Schindler added a comment -

          Steve Rowe: Can you also test the patch if it returns spins=false for the NVMe device on your machine.

          By the way: Mike and I had also problems with booting from NVMe, but in his case the BIOS did not detect it and was not able to boot from it. The Ubuntu installer wans able to handle the device name, but we were not able to boot. I helped him to find the fix (my experience with all this kind of shit in earlier days when I had a DEC Alpha Machine with SATA disks) was to use an (Elastic-)USB stick and make the Ubuntu installer mount it as /boot, so grub and kernel went onto it. By that the kernel and Grub bootloader are loaded from the USB drive and it chains to boot with NVMe as root mount. Also Ubuntu kernel updates worked perfectly, because of /boot as mount point.

          Show
          Uwe Schindler added a comment - Steve Rowe : Can you also test the patch if it returns spins=false for the NVMe device on your machine. By the way: Mike and I had also problems with booting from NVMe, but in his case the BIOS did not detect it and was not able to boot from it. The Ubuntu installer wans able to handle the device name, but we were not able to boot. I helped him to find the fix (my experience with all this kind of shit in earlier days when I had a DEC Alpha Machine with SATA disks) was to use an (Elastic-)USB stick and make the Ubuntu installer mount it as /boot, so grub and kernel went onto it. By that the kernel and Grub bootloader are loaded from the USB drive and it chains to boot with NVMe as root mount. Also Ubuntu kernel updates worked perfectly, because of /boot as mount point.
          Hide
          Steve Rowe added a comment - - edited

           Mike and I had also problems with booting from NVMe, but in his case the BIOS did not detect it and was not able to boot from it.

          I had similar issues, and ASUS support (ASUS mobo) actually told me that booting the NVMe drive wasn't possible.... Turns out all I had to do was select the "UEFI:optical drive" option as the first boot device from the BIOS, then boot into the installation DVD. I selected GPT partitions from the installer's partitioning menu, made the PCIe-SSD main OS partition bootable, and installed the Grub boot loader. No chaining required.

          Show
          Steve Rowe added a comment - - edited  Mike and I had also problems with booting from NVMe, but in his case the BIOS did not detect it and was not able to boot from it. I had similar issues, and ASUS support (ASUS mobo) actually told me that booting the NVMe drive wasn't possible.... Turns out all I had to do was select the "UEFI:optical drive" option as the first boot device from the BIOS, then boot into the installation DVD. I selected GPT partitions from the installer's partitioning menu, made the PCIe-SSD main OS partition bootable, and installed the Grub boot loader. No chaining required.
          Hide
          Michael McCandless added a comment -

          I tested the patch with a small standalone Java static main, and confirmed I get "true" before, and "false" with the patch! Thank you Uwe Schindler!

          Show
          Michael McCandless added a comment - I tested the patch with a small standalone Java static main, and confirmed I get "true" before, and "false" with the patch! Thank you Uwe Schindler !
          Hide
          Steve Rowe added a comment -

          I tested with the following in TestIOUtils:

            public void testRealNVME() throws Exception {
              Path dir = createTempDir();
              dir = FilterPath.unwrap(dir).toRealPath();
              assertFalse(IOUtils.spinsLinux(dir));
            }
          

          Which caused:

            [junit4] ERROR   0.15s | TestIOUtils.testRealNVME <<<
             [junit4]    > Throwable #1: java.security.AccessControlException: access denied ("java.io.FilePermission" "/dev/nvme0n1p2" "read")
             [junit4]    >        at __randomizedtesting.SeedInfo.seed([45BE167DAA8713BE:C4DF5928E609489]:0)
             [junit4]    >        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457)
             [junit4]    >        at java.security.AccessController.checkPermission(AccessController.java:884)
             [junit4]    >        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
             [junit4]    >        at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
             [junit4]    >        at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
             [junit4]    >        at sun.nio.fs.UnixPath.toRealPath(UnixPath.java:827)
             [junit4]    >        at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:495)
             [junit4]    >        at org.apache.lucene.util.TestIOUtils.testRealNVME(TestIOUtils.java:361)
          

          I added -Dtests.useSecurityManager=false to the ant cmdline, and the test passed. When I reverted the patch, the test failed.

          So it works for me.

          Show
          Steve Rowe added a comment - I tested with the following in TestIOUtils : public void testRealNVME() throws Exception { Path dir = createTempDir(); dir = FilterPath.unwrap(dir).toRealPath(); assertFalse(IOUtils.spinsLinux(dir)); } Which caused: [junit4] ERROR 0.15s | TestIOUtils.testRealNVME <<< [junit4] > Throwable #1: java.security.AccessControlException: access denied ("java.io.FilePermission" "/dev/nvme0n1p2" "read") [junit4] > at __randomizedtesting.SeedInfo.seed([45BE167DAA8713BE:C4DF5928E609489]:0) [junit4] > at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457) [junit4] > at java.security.AccessController.checkPermission(AccessController.java:884) [junit4] > at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) [junit4] > at java.lang.SecurityManager.checkRead(SecurityManager.java:888) [junit4] > at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795) [junit4] > at sun.nio.fs.UnixPath.toRealPath(UnixPath.java:827) [junit4] > at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:495) [junit4] > at org.apache.lucene.util.TestIOUtils.testRealNVME(TestIOUtils.java:361) I added -Dtests.useSecurityManager=false to the ant cmdline, and the test passed. When I reverted the patch, the test failed. So it works for me.
          Hide
          Uwe Schindler added a comment -

          Hi,
          I added another test: This creates an Ubuntu-like mount point with UUID and creates the symlinks. This is only supported on file systems that support symlinks, so there is an additional assume (not all filesystems on Linux support symlinks... I have a remote samba here to my windows box and it did not!). For this to work I added an additional policy entry.

          The test emulates the standard case: mount point points to /dev/disk/by-uuid/...

          I think patch is ready. Robert Muir, are you fine with the new test using symlinks?

          Show
          Uwe Schindler added a comment - Hi, I added another test: This creates an Ubuntu-like mount point with UUID and creates the symlinks. This is only supported on file systems that support symlinks, so there is an additional assume (not all filesystems on Linux support symlinks... I have a remote samba here to my windows box and it did not!). For this to work I added an additional policy entry. The test emulates the standard case: mount point points to /dev/disk/by-uuid/... I think patch is ready. Robert Muir , are you fine with the new test using symlinks?
          Hide
          Steve Rowe added a comment -

          Here's the failure message for my test without the patch (the trailing "p" shouldn't be there on the block device name):

             [junit4] ERROR   0.08s | TestIOUtils.testRealNVME <<<
             [junit4]    > Throwable #1: java.nio.file.NoSuchFileException: /sys/block/nvme0n1p/queue/rotational
          
          Show
          Steve Rowe added a comment - Here's the failure message for my test without the patch (the trailing "p" shouldn't be there on the block device name): [junit4] ERROR 0.08s | TestIOUtils.testRealNVME <<< [junit4] > Throwable #1: java.nio.file.NoSuchFileException: /sys/block/nvme0n1p/queue/rotational
          Hide
          Uwe Schindler added a comment -

          Thanks Steve, exactly that one was the problem! The "simply strip digits off the device name" was not working for the new device naming scheme.

          Show
          Uwe Schindler added a comment - Thanks Steve, exactly that one was the problem! The "simply strip digits off the device name" was not working for the new device naming scheme.
          Hide
          Uwe Schindler added a comment -

          I had similar issues, and ASUS support (ASUS mobo) actually told me that booting the NVMe drive wasn't possible.... Turns out all I had to do was select the "UEFI:optical drive" option as the first boot device from the BIOS, then boot into the installation DVD. I selected GPT partitions from the installer's partitioning menu, made the PCIe-SSD main OS partition bootable, and installed the Grub boot loader. No chaining required.

          This would have been our "other option". Michael McCandless, maybe try to install Ubuntu on a GPT formatted disk instead of old style BIOS partitions. We had that in mind, but to us it looked like the device is not seen at all. Maybe the BIOS just looks for devices with GPT partitions and does not even show those with 80s-style BIOS

          Show
          Uwe Schindler added a comment - I had similar issues, and ASUS support (ASUS mobo) actually told me that booting the NVMe drive wasn't possible.... Turns out all I had to do was select the "UEFI:optical drive" option as the first boot device from the BIOS, then boot into the installation DVD. I selected GPT partitions from the installer's partitioning menu, made the PCIe-SSD main OS partition bootable, and installed the Grub boot loader. No chaining required. This would have been our "other option". Michael McCandless , maybe try to install Ubuntu on a GPT formatted disk instead of old style BIOS partitions. We had that in mind, but to us it looked like the device is not seen at all. Maybe the BIOS just looks for devices with GPT partitions and does not even show those with 80s-style BIOS
          Hide
          Robert Muir added a comment -

          I think patch is ready. Robert Muir, are you fine with the new test using symlinks?

          The check is wrong. It can actually work on windows if you have system privs to do it, among other reasons. Please do it like this:

          try {
            Files.createSymbolicLink(x, y);
          } catch (UnsupportedOperationException | IOException e) {
            assumeNoException("test requires filesystem that supports symbolic links", e);
          }
          
          Show
          Robert Muir added a comment - I think patch is ready. Robert Muir, are you fine with the new test using symlinks? The check is wrong. It can actually work on windows if you have system privs to do it, among other reasons. Please do it like this: try { Files.createSymbolicLink(x, y); } catch (UnsupportedOperationException | IOException e) { assumeNoException( "test requires filesystem that supports symbolic links" , e); }
          Hide
          Steve Rowe added a comment -

          Maybe the BIOS just looks for devices with GPT partitions and does not even show those with 80s-style BIOS

          My BIOS shows my single optical drive as if it were two: once as UEFI and once as "regular". AFAICT, simply booting into the media on the optical drive in UEFI mode allowed the BIOS to see the NVMe drive.

          Show
          Steve Rowe added a comment - Maybe the BIOS just looks for devices with GPT partitions and does not even show those with 80s-style BIOS My BIOS shows my single optical drive as if it were two: once as UEFI and once as "regular". AFAICT, simply booting into the media on the optical drive in UEFI mode allowed the BIOS to see the NVMe drive.
          Hide
          Robert Muir added a comment -

          Also the link permission is maybe a little dangerous for people. If there is a bug in the test it could do potentially cause real harm.

          Maybe it should have an additional catch block for SecurityException like so:

          } catch (SecurityException e) {
            assumeNoException("test cannot create symbolic links with security manager enabled", e);
          }
          
          Show
          Robert Muir added a comment - Also the link permission is maybe a little dangerous for people. If there is a bug in the test it could do potentially cause real harm. Maybe it should have an additional catch block for SecurityException like so: } catch (SecurityException e) { assumeNoException( "test cannot create symbolic links with security manager enabled" , e); }
          Hide
          Uwe Schindler added a comment -

          The check is wrong. It can actually work on windows if you have system privs to do it, among other reasons. Please do it like this:

          Yes. But my check was exactly like that...? Actually I changed it to this exact code - I just had the exceptions in other order.

          We don't run test currently on Windows at all (all of those spins tests are assumeFalsed). The problem why I added the catch/assume was to make sure it also works if some Linux filesystem does not have symlinks... (I tested on Linux where it failed on a mounted Samba FS with my checkout). But test of course passed with local FS.

          Show
          Uwe Schindler added a comment - The check is wrong. It can actually work on windows if you have system privs to do it, among other reasons. Please do it like this: Yes. But my check was exactly like that...? Actually I changed it to this exact code - I just had the exceptions in other order. We don't run test currently on Windows at all (all of those spins tests are assumeFalsed). The problem why I added the catch/assume was to make sure it also works if some Linux filesystem does not have symlinks... (I tested on Linux where it failed on a mounted Samba FS with my checkout). But test of course passed with local FS.
          Hide
          Robert Muir added a comment -

          We don't run test currently on Windows at all (all of those spins tests are assumeFalsed).

          Not all of them: only 3 of them. And i don't know if those assumes are valid, maybe they in fact work on windows? Its all done with mocks.

          Show
          Robert Muir added a comment - We don't run test currently on Windows at all (all of those spins tests are assumeFalsed). Not all of them: only 3 of them. And i don't know if those assumes are valid, maybe they in fact work on windows? Its all done with mocks.
          Hide
          Uwe Schindler added a comment -

          They dont work on windows, I checked already Maybe I can look into this but the chrooting somehow fails. Its unrelated.

          I just added the assume for the symlinking. Because this also sometimes fails on Linux depending on file system (I ran tests on Samba mount with my checkout on remote checkout).

          Show
          Uwe Schindler added a comment - They dont work on windows, I checked already Maybe I can look into this but the chrooting somehow fails. Its unrelated. I just added the assume for the symlinking. Because this also sometimes fails on Linux depending on file system (I ran tests on Samba mount with my checkout on remote checkout).
          Hide
          ASF subversion and git services added a comment -

          Commit 1687883 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1687883 ]

          LUCENE-6614: Improve partition detection in IOUtils#spins() so it works with NVMe drives

          Show
          ASF subversion and git services added a comment - Commit 1687883 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1687883 ] LUCENE-6614 : Improve partition detection in IOUtils#spins() so it works with NVMe drives
          Hide
          ASF subversion and git services added a comment -

          Commit 1687884 from Uwe Schindler in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1687884 ]

          Merged revision(s) 1687883 from lucene/dev/trunk:
          LUCENE-6614: Improve partition detection in IOUtils#spins() so it works with NVMe drives

          Show
          ASF subversion and git services added a comment - Commit 1687884 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1687884 ] Merged revision(s) 1687883 from lucene/dev/trunk: LUCENE-6614 : Improve partition detection in IOUtils#spins() so it works with NVMe drives
          Hide
          Uwe Schindler added a comment -

          If I have some time, I will look into the chroot code of the MockFileSystem so I can make it work with Windows. Currently it breaks because of unexpected drive letters.

          Show
          Uwe Schindler added a comment - If I have some time, I will look into the chroot code of the MockFileSystem so I can make it work with Windows. Currently it breaks because of unexpected drive letters.
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development