Commons Math
  1. Commons Math
  2. MATH-845

Basic number theory features such as primality testing, factorization and prime number generation

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.2
    • Labels:
    • Environment:

      ubuntu/java6/intel-i5

      Description

      A set of static methods to perform primality test, factorization and prime number generation. Currently it is limited to the int data type, extension to long/BigInteger will follow.

      1. primes-src.zip
        14 kB
        Sebastien Riou

        Activity

        Luc Maisonobe made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Luc Maisonobe added a comment -

        Closing issue as version 3.2 has been released on 2013-04-06.

        Show
        Luc Maisonobe added a comment - Closing issue as version 3.2 has been released on 2013-04-06.
        Luc Maisonobe made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Luc Maisonobe added a comment -

        Fixed in subversion repository as of r1454920.

        Patch applied with a few changes:

        • replaced assert by comments
        • formatting (spaces, braces, control statements on one line...)
        • very long performance tests not committed
        • updated tests to Junit 4

        As the latest comments suggested it would not be easy to extend this package to long or BigIntegers, I have left the Primes class as first proposed, without an upper parameterized interface.

        Thanks for the suggestion and for the patch!

        Show
        Luc Maisonobe added a comment - Fixed in subversion repository as of r1454920. Patch applied with a few changes: replaced assert by comments formatting (spaces, braces, control statements on one line...) very long performance tests not committed updated tests to Junit 4 As the latest comments suggested it would not be easy to extend this package to long or BigIntegers, I have left the Primes class as first proposed, without an upper parameterized interface. Thanks for the suggestion and for the patch!
        Gilles made changes -
        Fix Version/s 3.2 [ 12322545 ]
        Fix Version/s 3.1 [ 12317576 ]
        Hide
        Gilles added a comment -

        [...] I tend to prefer assertions over simple comments (assertions could be turned on (at least, locally) during testing).

        IMO, neither "assert" statements nor simple comments are useful for testing (or more precisely, debugging). What we'd need is logging.

        "assert" aborts the program, but does not tell you how the wrong value made it there.
        I think that "assert" is very useful, but in applications (meaning: don't bother with the rest of the computation, as I know that someting went wrong).
        By contrast, in CM, no "assert" should ever be triggered from internal calls. Furthermore, no "assert" should ever be triggered from external calls, as that would mean that unsafe (if assertions are disabled) code can be called.
        To be robust, CM raises a "MathInternalError" at places no code path should ever lead to; but if one does, at least something predictable happens.

        One thing to check in the first place is whether a precondition check actually slows down things in an "unbearable" way.

        Show
        Gilles added a comment - [...] I tend to prefer assertions over simple comments (assertions could be turned on (at least, locally) during testing). IMO, neither "assert" statements nor simple comments are useful for testing (or more precisely, debugging). What we'd need is logging. "assert" aborts the program, but does not tell you how the wrong value made it there. I think that "assert" is very useful, but in applications (meaning: don't bother with the rest of the computation, as I know that someting went wrong). By contrast, in CM, no "assert" should ever be triggered from internal calls. Furthermore, no "assert" should ever be triggered from external calls, as that would mean that unsafe (if assertions are disabled) code can be called. To be robust, CM raises a "MathInternalError" at places no code path should ever lead to; but if one does, at least something predictable happens. One thing to check in the first place is whether a precondition check actually slows down things in an "unbearable" way.
        Hide
        Sébastien Brisard added a comment -

        In cases where the method is for internal use only, and we are sure that the CM code calls it correctly, my preference would be to just remove the unused statement, and replace it with an appropriate comment stating why an explicit check is not necessary.

        Interesting! I was about to ask the same question on the ML for another issue (incomplete beta function).
        I really would like to have this point discussed more widely, as I tend to prefer assertions over simple comments (assertions could be turned on (at least, locally) during testing).

        Of course, I'm in favor of an additional comment stating that explicit check is deemed unnecessary for private or package private methods. In my case, some package private methods provide approximations of some functions over a specific interval. I would then state in the javadoc the interval, and also that it is the responsibility of the user to make sure that this approximation is not called outside its domain.

        @Sébastien: since you first asked the question, would you like to start a thread on the ML?

        Best regards,
        Sébastien (another one )

        Show
        Sébastien Brisard added a comment - In cases where the method is for internal use only, and we are sure that the CM code calls it correctly, my preference would be to just remove the unused statement, and replace it with an appropriate comment stating why an explicit check is not necessary. Interesting! I was about to ask the same question on the ML for another issue (incomplete beta function). I really would like to have this point discussed more widely, as I tend to prefer assertions over simple comments (assertions could be turned on (at least, locally) during testing). Of course, I'm in favor of an additional comment stating that explicit check is deemed unnecessary for private or package private methods. In my case, some package private methods provide approximations of some functions over a specific interval. I would then state in the javadoc the interval, and also that it is the responsibility of the user to make sure that this approximation is not called outside its domain. @Sébastien: since you first asked the question, would you like to start a thread on the ML? Best regards, Sébastien (another one )
        Hide
        Gilles added a comment -

        I have modified my local copy of pom.xml [...]

        That should not be necessary, the "pom.xml" in "trunk" already contains the necessary configuration.

        But I'm not sure that those formatting rules which I referred to above are enforced by CheckStyle. I just did a manual detection!

        About "assert" [...]

        This an interesting question. Maybe you should raise the issue on the "dev" ML.
        AFAIK, there is no policy for the use of "assert" within the CM code.
        In the case of a library like CM, robustness is important; I guess that it is less safe to use "assert" since an application could silently fail if assertions are not enabled.
        In cases where the method is for internal use only, and we are sure that the CM code calls it correctly, my preference would be to just remove the unused statement, and replace it with an appropriate comment stating why an explicit check is not necessary.

        Show
        Gilles added a comment - I have modified my local copy of pom.xml [...] That should not be necessary, the "pom.xml" in "trunk" already contains the necessary configuration. But I'm not sure that those formatting rules which I referred to above are enforced by CheckStyle. I just did a manual detection! About "assert" [...] This an interesting question. Maybe you should raise the issue on the "dev" ML. AFAIK, there is no policy for the use of "assert" within the CM code. In the case of a library like CM, robustness is important; I guess that it is less safe to use "assert" since an application could silently fail if assertions are not enabled. In cases where the method is for internal use only, and we are sure that the CM code calls it correctly, my preference would be to just remove the unused statement, and replace it with an appropriate comment stating why an explicit check is not necessary.
        Hide
        Sebastien Riou added a comment - - edited

        Thanks for the review, I will apply the requested formatting next time. So I guess I am not running checkstyle correctly, because I did not see any of those errors in the report. I run it from command line "mvn checkstyle:checkstyle", and I have modified my local copy of pom.xml with the following:

              <plugin>
                  <groupId>org.apache.maven.plugins</groupId>
                  <artifactId>maven-checkstyle-plugin</artifactId>
                  <version>2.9.1</version>
                  <configuration>
                      <configLocation>checkstyle.xml</configLocation>
                  </configuration>
              </plugin>
        

        inserted in <build>. Is there something I am missing ?

        About "assert", In the case of private or package private methods, is it also required to turn them into runtime checks ? Is it acceptable to simply comment them if the performance impact is significant ?

        interface Primes<T extends Number> -> agreed

        implementation/performance question: I expect implementation for long to be significantly slower, as it is the case for gcd. int implementation uses long for some internal values, so a long implementation will need to use BigInteger at those places, that should slow down things... Also current int implementation contain an array with all primes smaller or equal to the cubic square of Integer.MAX_VALUE. Doing the same for Long.MAX_VALUE is going to significantly increase the size of the array, I am not sure if this is desirable -> in short some performance tricks applicable to int don't scale well. In case of isPrime, the trick to get a provable result using Miller-Rabin works well for int, but I have to search additional reference to get the magic numbers to scale it to long, and even after that, maybe we will end up with something slower than a true provable algorithm.
        I am more inclined to share the same implementation for long and BigInteger.

        Show
        Sebastien Riou added a comment - - edited Thanks for the review, I will apply the requested formatting next time. So I guess I am not running checkstyle correctly, because I did not see any of those errors in the report. I run it from command line "mvn checkstyle:checkstyle", and I have modified my local copy of pom.xml with the following: <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-checkstyle-plugin</artifactId> <version>2.9.1</version> <configuration> <configLocation>checkstyle.xml</configLocation> </configuration> </plugin> inserted in <build>. Is there something I am missing ? About "assert", In the case of private or package private methods, is it also required to turn them into runtime checks ? Is it acceptable to simply comment them if the performance impact is significant ? interface Primes<T extends Number> -> agreed implementation/performance question: I expect implementation for long to be significantly slower, as it is the case for gcd. int implementation uses long for some internal values, so a long implementation will need to use BigInteger at those places, that should slow down things... Also current int implementation contain an array with all primes smaller or equal to the cubic square of Integer.MAX_VALUE. Doing the same for Long.MAX_VALUE is going to significantly increase the size of the array, I am not sure if this is desirable -> in short some performance tricks applicable to int don't scale well. In case of isPrime, the trick to get a provable result using Miller-Rabin works well for int, but I have to search additional reference to get the magic numbers to scale it to long, and even after that, maybe we will end up with something slower than a true provable algorithm. I am more inclined to share the same implementation for long and BigInteger.
        Hide
        Gilles added a comment -

        After a very quick look at the code, some remarks about style:

        • "if"-blocks must not be on the same line as the condition check, even if there is a single statement.
        • There must be one space characted before an opening bracket.
        • Closing brackets must be on the following line.
        • "assert" statements should be replaced by precondition checks (raising an appropriate exception if they fail).

        Design question:
        Wouldn't it be useful to create an interface Primes<T extends Number> that would define all the methods to be implemented for each type (Integer, Long, BigInteger, ...)? Your "Primes" class then becomes

        public class IntegerPrimes implements Primes<Integer> {
          // ...
        }
        

        Implementation (and performance) question: Is it useful to have this functionality for "int" rather than just for "long". I mean: Is the "int" implementation faster in any actual application?
        If not, what would it take to convert all your proposed code to work with "long"?

        Show
        Gilles added a comment - After a very quick look at the code, some remarks about style: "if"-blocks must not be on the same line as the condition check, even if there is a single statement. There must be one space characted before an opening bracket. Closing brackets must be on the following line. "assert" statements should be replaced by precondition checks (raising an appropriate exception if they fail). Design question: Wouldn't it be useful to create an interface Primes<T extends Number> that would define all the methods to be implemented for each type (Integer, Long, BigInteger, ...)? Your "Primes" class then becomes public class IntegerPrimes implements Primes< Integer > { // ... } Implementation (and performance) question: Is it useful to have this functionality for "int" rather than just for "long". I mean: Is the "int" implementation faster in any actual application? If not, what would it take to convert all your proposed code to work with "long"?
        Sebastien Riou made changes -
        Field Original Value New Value
        Attachment primes-src.zip [ 12540614 ]
        Hide
        Sebastien Riou added a comment -

        Implementation of three methods: primality test, factorization and prime generation (nextPrime method). Following the suggestion of Gilles, it is in a new package rather than inserted in existing files. It contains an improved version of Pollard's rho method. After breaking my head to get it work, I found out that it is not faster than trial division, the int range must be too small to see the benefits. I included it anyway, it may be more adapted for dealing with long and BigInteger types.

        Show
        Sebastien Riou added a comment - Implementation of three methods: primality test, factorization and prime generation (nextPrime method). Following the suggestion of Gilles, it is in a new package rather than inserted in existing files. It contains an improved version of Pollard's rho method. After breaking my head to get it work, I found out that it is not faster than trial division, the int range must be too small to see the benefits. I included it anyway, it may be more adapted for dealing with long and BigInteger types.
        Sebastien Riou created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Sebastien Riou
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development