Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-3313

First deployment of NiFi can hang on VMs without sufficient entropy if using /dev/random

Agile BoardAttach filesAttach ScreenshotVotersStop watchingWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Analysis of Issue

      Statement of Problem:

      NiFi deployed on headless VM (little user interaction by way of keyboard and mouse I/O) can take 5-10 minutes (reported) to start up. User reports this occurs on a "secure" cluster. Further examination is required to determine which specific process requires the large amount of random input (no steps to reproduce, configuration files, logs, or VM environment information provided).

      Context

      The likely cause of this issue is that a process is attempting to read from /dev/random, a *nix "device" providing a pseudo-random number generator (PRNG). Also available is /dev/urandom, a related PRNG. Despite common misperceptions, /dev/urandom is not "less-secure" than /dev/random for all general use cases. /dev/random blocks if the entropy estimate (a "guess" of the existing entropy introduced into the pool) is lower than the amount of random data requested by the caller. In contrast, /dev/urandom does not block, but provides the output of the same cryptographically-secure PRNG (CSPRNG) that /dev/random reads from [myths]. After as little as 256 bytes of initial seeding, accessing /dev/random and /dev/urandom are functionally equivalent, as the long period of random data generated will not require re-seeding before sufficient entropy can be provided again.

      As mentioned earlier, further examination is required to determine if the process requiring random input occurs at application boot or only at "machine" (hardware or VM) boot. On the first deployment of the system with certificates, the certificate generation process will require substantial random input. However, on application launch and connection to a cluster, even the TLS/SSL protocol requires some amount of random input.

      Proposed Solutions

      rngd

      A software toolset for accessing dedicated hardware PRNG (true RNG, or TRNG) called rng-tools [rngtools] exists for Linux. Specialized hardware, as well as Intel chips from IvyBridge and on (2012), can provide hardware-generated random input to the kernel. Using the daemon rngd to seed the /dev/random and /dev/urandom entropy pool is the simplest solution.

      Note: Do not use /dev/urandom to seed /dev/random using rngd. This is like running a garden hose from a car's exhaust back into its gas tank and trying to drive.

      Instruct Java to use /dev/urandom

      The Java Runtime Environment (JRE) can be instructed to use /dev/urandom for all invocations of SecureRandom, either on a per-Java process basis [jdk-urandom] or in the JVM configuration [oracle-urandom], which means it will not block on server startup. The NiFi bootstrap.conf file can be modified to contain an additional Java argument directing the JVM to use /dev/urandom.

      Other Solutions

      Entropy Gathering Tools

      Tools to gather entropy from non-standard sources (audio card noise, video capture from webcams, etc.) have been developed such as audio-entropyd [wagner], but these tools are not verified or well-examined – usually when tested, they are only tested for the strength of their PRNG, not the ability of the tool to capture entropy and generate sufficiently random data unavailable to an attacker who may be able to determine the internal state.

      haveged

      A solution has been proposed to use havaged [haveged], a user-space daemon relying on the HAVEGE (HArdware Volatile Entropy Gathering and Expansion) construct to continually increase the entropy on the system, allowing /dev/random to run without blocking.

      However, on further investigation, multiple sources indicate this solution may be insecure [dice][leek-havege].

      Michael Kerrisk:

      Having read a number of papers about HAVEGE, Peter [Anvin] said he had been unable to work out whether this was a "real thing". Most of the papers that he has read run along the lines, "we took the output from HAVEGE, and ran some tests on it and all of the tests passed". The problem with this sort of reasoning is the point that Peter made earlier: there are no tests for randomness, only for non-randomness.

      One of Peter's colleagues replaced the random input source employed by HAVEGE with a constant stream of ones. All of the same tests passed. In other words, all that the test results are guaranteeing is that the HAVEGE developers have built a very good PRNG. It is possible that HAVEGE does generate some amount of randomness, Peter said. But the problem is that the proposed source of randomness is simply too complex to analyze; thus it is not possible to make a definitive statement about whether it is truly producing randomness. (By contrast, the HWRNGs that Peter described earlier have been analyzed to produce a quantum theoretical justification that they are producing true randomness.) "So, while I can't really recommend it, I can't not recommend it either." If you are going to run HAVEGE, Peter strongly recommended running it together with rngd, rather than as a replacement for it.

      Tom Leek:

      Of course, the whole premise of HAVEGE is questionable. For any practical security, you need a few "real random" bits, no more than 200, which you use as seed in a cryptographically secure PRNG. The PRNG will produce gigabytes of pseudo-[data] indistinguishable from true randomness, and that's good enough for all practical purposes.

      Insisting on going back to the hardware for every bit looks like yet another outbreak of that flawed idea which sees entropy as a kind of gasoline, which you burn up when you look at it.

      Next Steps

      As described above, further investigation is necessary, but moving forward, barring new information, I would propose directing the JVM to use /dev/urandom and making rngd available to systems that support a TRNG.

      [myths] http://www.2uo.de/myths-about-urandom/
      [rngtools] https://git.kernel.org/cgit/utils/kernel/rng-tools/rng-tools.git/about/
      [jdk-urandom] http://stackoverflow.com/a/2325109/70465
      [oracle-urandom] https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html
      [wagner] https://people.eecs.berkeley.edu/~daw/rnd/
      [haveged] http://www.issihosts.com/haveged/
      [dice] https://lwn.net/Articles/525459/
      [leek-havege] http://security.stackexchange.com/a/34552/16485

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            alopresto Andy LoPresto
            alopresto Andy LoPresto
            Votes:
            0 Vote for this issue
            Watchers:
            7 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment