Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5754

rand() algorithm is very non-random

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 2.12.0
    • Component/s: Backend
    • Labels:
    • Epic Color:
      ghx-label-8

      Description

      MathFunctions::Rand includes the line *seed = rand_r(seed);. I think this is incorrect use of rand_r, which sets its seed during the call, and so doesn't seed to set it in an assignment. This produces very unrandom output; the following program which simulates this typically loops after less than 20k distinct items, while a good PRNG would produce somewhere in the neighborhood of RAND_MAX/2 items before looping.

      #include <cstdlib>
      #include <unordered_set>
      #include <iostream>
      
      using namespace std;
      
      int main() {
        unsigned int seed;
        while(cin >> seed) {
          unordered_set<int> history;
          while (history.find(seed) == history.end()) {
            history.insert(seed);
            seed = rand_r(&seed);
            if (0 == (history.size() & (history.size() - 1))) {
              cout << history.size() << endl;
            }
          }
          cout << history.size() << endl;
        }
      }
      

      In any case, we should drop the use of rand_r; see IMPALA-4954.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jinchul Jinchul Kim
                Reporter:
                jbapple Jim Apple
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: