Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5754

rand() algorithm is very non-random

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 2.12.0
    • Backend
    • ghx-label-8

    Description

      MathFunctions::Rand includes the line *seed = rand_r(seed);. I think this is incorrect use of rand_r, which sets its seed during the call, and so doesn't seed to set it in an assignment. This produces very unrandom output; the following program which simulates this typically loops after less than 20k distinct items, while a good PRNG would produce somewhere in the neighborhood of RAND_MAX/2 items before looping.

      #include <cstdlib>
      #include <unordered_set>
      #include <iostream>
      
      using namespace std;
      
      int main() {
        unsigned int seed;
        while(cin >> seed) {
          unordered_set<int> history;
          while (history.find(seed) == history.end()) {
            history.insert(seed);
            seed = rand_r(&seed);
            if (0 == (history.size() & (history.size() - 1))) {
              cout << history.size() << endl;
            }
          }
          cout << history.size() << endl;
        }
      }
      

      In any case, we should drop the use of rand_r; see IMPALA-4954.

      Attachments

        Issue Links

          Activity

            People

              jinchul Jin Chul Kim
              jbapple Jim Apple
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: