Hadoop Common
  1. Hadoop Common
  2. HADOOP-4437

Use qMC sequence to improve the accuracy of PiEstimator

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed. The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low. The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

      1. 4437_20081019.patch
        6 kB
        Tsz Wo Nicholas Sze
      2. 4437_20081103.patch
        6 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          Tested manually with a 100-nodes cluster. The biggest sample set I have run was

          • 1000 maps and 10000000 samples per map.
            Job Finished in 67.337 seconds
            Estimated value of PI is 3.14159264520000000000

          I just committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - Tested manually with a 100-nodes cluster. The biggest sample set I have run was 1000 maps and 10000000 samples per map. Job Finished in 67.337 seconds Estimated value of PI is 3.14159264520000000000 I just committed this.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > This is nice. I understand it is just an example but if we run more maps for a longer period of time can we get more Pi digits?

          Yes, the more samples used the more digits will get in both Monte Carlo method (java.util.Random) and qMC method (Halton sequence).

          However, the discrepancy for Halton sequence is smaller than java.util.Random. The expected error of java.util.Random is O(1/sqrt(N)) while the expected error of using Halton sequence is O((ln N)/N), where N is the number for samples. For estimating Pi with 100,000,000 samples, the accuracy of Halton is ~7 digits but java.util.Random is only ~4 digits as shown previously.

          > Now, as an example it should imo have much better documentation.

          This is a good point. I plan to further improve the PiEstimator. Let me also improve the documentation in the next issue.

          Show
          Tsz Wo Nicholas Sze added a comment - > This is nice. I understand it is just an example but if we run more maps for a longer period of time can we get more Pi digits? Yes, the more samples used the more digits will get in both Monte Carlo method (java.util.Random) and qMC method (Halton sequence). However, the discrepancy for Halton sequence is smaller than java.util.Random. The expected error of java.util.Random is O(1/sqrt(N)) while the expected error of using Halton sequence is O((ln N)/N), where N is the number for samples. For estimating Pi with 100,000,000 samples, the accuracy of Halton is ~7 digits but java.util.Random is only ~4 digits as shown previously. > Now, as an example it should imo have much better documentation. This is a good point. I plan to further improve the PiEstimator. Let me also improve the documentation in the next issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12393275/4437_20081103.patch
          against trunk revision 709609.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12393275/4437_20081103.patch against trunk revision 709609. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          This is nice. I understand it is just an example but if we run more maps for a longer period of time can we get more Pi digits?
          Now, as an example it should imo have much better documentation.

          Show
          Konstantin Shvachko added a comment - This is nice. I understand it is just an example but if we run more maps for a longer period of time can we get more Pi digits? Now, as an example it should imo have much better documentation.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4437_20081103.patch: added javadoc, no code changes.

          Show
          Tsz Wo Nicholas Sze added a comment - 4437_20081103.patch: added javadoc, no code changes.
          Hide
          Chris Douglas added a comment -

          Since it's in the examples, a few brief notes on the initialization of HaltonSequence and on nextPoint would help orient readers. It doesn't need to be a course in statistics- the existing code isn't, either- but even a sentence or two on why PiEstimator is using it and perhaps a couple comments identifying the variables would be helpful.

          +1 on the patch, though; documentation is just a suggestion.

          Show
          Chris Douglas added a comment - Since it's in the examples, a few brief notes on the initialization of HaltonSequence and on nextPoint would help orient readers. It doesn't need to be a course in statistics- the existing code isn't, either- but even a sentence or two on why PiEstimator is using it and perhaps a couple comments identifying the variables would be helpful. +1 on the patch, though; documentation is just a suggestion.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4437_20081019.patch: replace java.util.Random with Halton sequence.

          Try totally 100000000 samples

          • Before the patch:
            Job Finished in 22.422 seconds
            Estimated value of PI is 3.14145832
          • After the patch:
            Job Finished in 13.375 seconds
            Estimated value of PI is 3.14159256000000000000
          Show
          Tsz Wo Nicholas Sze added a comment - 4437_20081019.patch: replace java.util.Random with Halton sequence. Try totally 100000000 samples Before the patch: Job Finished in 22.422 seconds Estimated value of PI is 3.14145832 After the patch: Job Finished in 13.375 seconds Estimated value of PI is 3.14159256000000000000

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development