Commons Math
  1. Commons Math
  2. MATH-1019

"shuffle" method broken (in class "o.a.c.m.random.RandomDataGenerator")

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3
    • Labels:
      None

      Description

      The method does not abide by its contract: elements before the "end" index are included in the shuffle.

      /**
       * Uses a 2-cycle permutation shuffle to randomly re-order the last elements 
       * of list.
       *
       * @param list list to be shuffled
       * @param end element past which shuffling begins
       */
      private void shuffle(int[] list, int end) {
          int target = 0;
          for (int i = list.length - 1; i >= end; i--) {
              if (i == 0) { // XXX "0" should be "end"
                  target = 0; // XXX "0" should be "end"
              } else {
                  // NumberIsTooLargeException cannot occur
                  target = nextInt(0, i); // XXX "0" should be "end"
              }
              int temp = list[target];
              list[target] = list[i];
              list[i] = temp;
          }
      }
      

      I'm going to introduce the above corrections in the new implementation to be located in "MathArrays" (cf. issue MATH-1010).

        Issue Links

          Activity

          Gilles created issue -
          Gilles made changes -
          Field Original Value New Value
          Summary "shuffle" method broken (in class "o.a.c.m.RandomDataGenerator") "shuffle" method broken (in class "o.a.c.m.random.RandomDataGenerator")
          Gilles made changes -
          Link This issue relates to MATH-1010 [ MATH-1010 ]
          Hide
          Gilles added a comment -

          In "RandomDataGenerator", I tried to replace the use of "shuffle" that appears in "nextPermutation(int n, int k)" (at line 642) by a call to the new "MathArrays.shuffle" (cf. MATH-1010) but this makes the "testNextSample" unit test fail!

          Show
          Gilles added a comment - In "RandomDataGenerator", I tried to replace the use of "shuffle" that appears in "nextPermutation(int n, int k)" (at line 642) by a call to the new "MathArrays.shuffle" (cf. MATH-1010 ) but this makes the "testNextSample" unit test fail!
          Hide
          Gilles added a comment -

          With the replacement, the unit test "testNextPermutation()" passes but it uses a fully shuffled array where there is no difference in behaviour between the old (in "RandomDataGenerator") and the fixed (in "MathArrays") code.

          Show
          Gilles added a comment - With the replacement, the unit test "testNextPermutation()" passes but it uses a fully shuffled array where there is no difference in behaviour between the old (in "RandomDataGenerator") and the fixed (in "MathArrays") code.
          Gilles made changes -
          Link This issue relates to MATH-1020 [ MATH-1020 ]
          Hide
          Phil Steitz added a comment -

          It might be better not to log this and MATH-1020 as separate issues - i.e., this is just work related to making the formerly private internal method public. As you note, the bugged API spec / impl does not impact its use in RandomDataGenerator. Tracking this and MATH-1020 as issues against 3.2 and reporting their resolution in 3.3 may be misleading to users reading the release notes. The 3.2 implementation of nextPermutation in RandomDataGenerator performs according to spec and seeing the bug report and resolution in the 3.3 release notes may give users the impression that there is an actual usage-impacting bug in the 3.2 impl, which is not true. May be best to just note the problem in the other issue (MATH-1010) and drop this and MATH-1020 as separate issues.

          Show
          Phil Steitz added a comment - It might be better not to log this and MATH-1020 as separate issues - i.e., this is just work related to making the formerly private internal method public. As you note, the bugged API spec / impl does not impact its use in RandomDataGenerator. Tracking this and MATH-1020 as issues against 3.2 and reporting their resolution in 3.3 may be misleading to users reading the release notes. The 3.2 implementation of nextPermutation in RandomDataGenerator performs according to spec and seeing the bug report and resolution in the 3.3 release notes may give users the impression that there is an actual usage-impacting bug in the 3.2 impl, which is not true. May be best to just note the problem in the other issue ( MATH-1010 ) and drop this and MATH-1020 as separate issues.
          Hide
          Gilles added a comment -

          IMHO, the svn log and the bug tracking system are not solely directed to library users.
          Developers too might like to be able to follow why and how code changes.

          I understand your concern, but "historically" MATH-1010 was not meant to fix something, just to move code from one place to another; I fortuitously uncovered the bug because the method was intended to do more than is actually needed in "RandomDataGenerator" ("user" behaviour was indeed fine but "developer" code was misleading).

          In order to not frighten users, I can add a comment in the "changes.xml" file to that effect; i.e. for MATH-1020, something like: This bug does not affect applications using a previous version of Commons Math. For this issue, it should be obvious since the method "shuffle" was private.

          Show
          Gilles added a comment - IMHO, the svn log and the bug tracking system are not solely directed to library users. Developers too might like to be able to follow why and how code changes. I understand your concern, but "historically" MATH-1010 was not meant to fix something, just to move code from one place to another; I fortuitously uncovered the bug because the method was intended to do more than is actually needed in "RandomDataGenerator" ("user" behaviour was indeed fine but "developer" code was misleading). In order to not frighten users, I can add a comment in the "changes.xml" file to that effect; i.e. for MATH-1020 , something like: This bug does not affect applications using a previous version of Commons Math. For this issue, it should be obvious since the method "shuffle" was private.
          Hide
          Phil Steitz added a comment -

          I like to keep what is in JIRA understandable to users, but I get your point and am OK with keeping these as separate issues as long as you make it clear in changes.xml. Also, what would you think about changing the affects version to 3.3? Unfortunately, "nightly builds" does not seem to be an option.

          Show
          Phil Steitz added a comment - I like to keep what is in JIRA understandable to users, but I get your point and am OK with keeping these as separate issues as long as you make it clear in changes.xml. Also, what would you think about changing the affects version to 3.3? Unfortunately, "nightly builds" does not seem to be an option.
          Hide
          Gilles added a comment -

          "Affects Version" entry set to empty to indicate that the bug does not affect applications.

          Show
          Gilles added a comment - "Affects Version" entry set to empty to indicate that the bug does not affect applications.
          Gilles made changes -
          Affects Version/s 3.2 [ 12322545 ]
          Gilles made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Gilles added a comment -

          The Javadoc for "shuffle" must be updated: a request to the
          reference link
          http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node83.html
          replies "Forbidden".

          Show
          Gilles added a comment - The Javadoc for "shuffle" must be updated: a request to the reference link http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node83.html replies "Forbidden".
          Hide
          Phil Steitz added a comment -

          This looks like a published version of the course notes in the now dead reference:
          http://citeseerx.ist.psu.edu/viewdoc/download;?doi=10.1.1.173.1898&rep=rep1&type=pdf

          The algorithm is on page 68. Not sure how long this will be available online, so it may be best just to reference the dead trees version.

          Show
          Phil Steitz added a comment - This looks like a published version of the course notes in the now dead reference: http://citeseerx.ist.psu.edu/viewdoc/download;?doi=10.1.1.173.1898&rep=rep1&type=pdf The algorithm is on page 68. Not sure how long this will be available online, so it may be best just to reference the dead trees version.
          Hide
          Gilles added a comment -

          [...] so it may be best just to reference the dead trees version.

          I'm not sure I understand: Do you suggest to leave an unreachable link in the Javadoc?

          Wouldn't this link be an adequate reference?

          Show
          Gilles added a comment - [...] so it may be best just to reference the dead trees version. I'm not sure I understand: Do you suggest to leave an unreachable link in the Javadoc? Wouldn't this link be an adequate reference?
          Hide
          Phil Steitz added a comment -

          Sorry, but the "dead trees version" I meant the published book. The link above is to a pdf of the book. I think it is better to stick with this, which is the original reference that describes what we have implemented exactly. The wikipedia article covers several different algorithms, one of which is similar to what we have.

          Show
          Phil Steitz added a comment - Sorry, but the "dead trees version" I meant the published book. The link above is to a pdf of the book. I think it is better to stick with this, which is the original reference that describes what we have implemented exactly. The wikipedia article covers several different algorithms, one of which is similar to what we have.
          Hide
          Gilles added a comment - - edited

          [...] "dead trees version" [...]

          Oh, I got the expression now.

          But I still don't see the reference; the one in the Javadoc is not a link to a book, and since the link is dead, it couldn't be more useless.
          Then the one which you mention is a PDF, and IIUC, you said that it might not be reliable to reference that link.

          The wikipedia article covers several different algorithms, one of which is similar to what we have.

          Thus we can put a link to the appropriate section.
          Also it's more user-friendly IMO to be redirected to that section than to have to download the whole textbook and refer to a page number.

          Show
          Gilles added a comment - - edited [...] "dead trees version" [...] Oh, I got the expression now. But I still don't see the reference; the one in the Javadoc is not a link to a book, and since the link is dead, it couldn't be more useless. Then the one which you mention is a PDF, and IIUC, you said that it might not be reliable to reference that link. The wikipedia article covers several different algorithms, one of which is similar to what we have. Thus we can put a link to the appropriate section . Also it's more user-friendly IMO to be redirected to that section than to have to download the whole textbook and refer to a page number.
          Hide
          Phil Steitz added a comment - - edited

          OK, assuming the section link works and what is described there matches what we have, I agree it is easier. It would be nice to reference the e-book in any case, as that was the original source. The lecture notes linked in the javadoc were eventually published as the ebook and removed from the university web site. At least the relevant section matches my recollection of the nice description and justification for the shuffle algorithm in the original link. I see now that there does not actually appear to be a "dead trees version" after all. apologies for that I assumed the pdf was an image of a physical book published by the University of Aberdeen. Looks like it is only distributed as an ebook, generally behind a paywall. It can be referenced as Algorithms, by Ian Craw, John Pulham, University of Aberdeen 1999.

          Show
          Phil Steitz added a comment - - edited OK, assuming the section link works and what is described there matches what we have, I agree it is easier. It would be nice to reference the e-book in any case, as that was the original source. The lecture notes linked in the javadoc were eventually published as the ebook and removed from the university web site. At least the relevant section matches my recollection of the nice description and justification for the shuffle algorithm in the original link. I see now that there does not actually appear to be a "dead trees version" after all. apologies for that I assumed the pdf was an image of a physical book published by the University of Aberdeen. Looks like it is only distributed as an ebook, generally behind a paywall. It can be referenced as Algorithms , by Ian Craw, John Pulham, University of Aberdeen 1999.
          Hide
          Gilles added a comment -

          I hope that changes brought in revision 1513501 are fine.

          Show
          Gilles added a comment - I hope that changes brought in revision 1513501 are fine.

            People

            • Assignee:
              Unassigned
              Reporter:
              Gilles
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development