Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4645

Providing a random seed to Slive should make the sequence of filenames completely deterministic

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.1, 2.0.0-alpha
    • Fix Version/s: 0.23.4
    • Component/s: performance, test
    • Labels:
    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      Using the -random seed option still doesn't produce a deterministic sequence of filenames. Hence there's no way to replicate the performance test. If I'm providing a seed, its obvious that I want the test to be reproducible.

      1. MAPREDUCE-4645.branch-0.23.patch
        4 kB
        Ravi Prakash
      2. MAPREDUCE-4645.branch-0.23.patch
        4 kB
        Ravi Prakash
      3. MAPREDUCE-4645.branch-0.23.patch
        8 kB
        Ravi Prakash

        Activity

        Hide
        Ravi Prakash added a comment -

        This patch changes the dummy key for the SliveMapper to be a "splitID" and the Random number generator to be seeded with that splitID + user-specified seed. Also the PathFinder which generates the path, is given its own separate instance of Random, so that if you run the same Slive command twice, all ops will succeed the first time and fail the second time (because the file would already have been created / deleted the first time)

        Show
        Ravi Prakash added a comment - This patch changes the dummy key for the SliveMapper to be a "splitID" and the Random number generator to be seeded with that splitID + user-specified seed. Also the PathFinder which generates the path, is given its own separate instance of Random, so that if you run the same Slive command twice, all ops will succeed the first time and fail the second time (because the file would already have been created / deleted the first time)
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12544273/MAPREDUCE-4645.branch-0.23.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2834//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2834//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544273/MAPREDUCE-4645.branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2834//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2834//console This message is automatically generated.
        Hide
        Ravi Prakash added a comment -

        The same patch applies to branch-2 and trunk.

        Show
        Ravi Prakash added a comment - The same patch applies to branch-2 and trunk.
        Hide
        Konstantin Shvachko added a comment -

        Ravi, can you just use taskID as seed instead of passing the sequence number through DummyInputFormat. That way you will have different seeds per map but still completely reproducible because the number of maps is the same.
        Otherwise DummyInputFormat becomes not "dummy" and EmptySplit not "empty" anymore.

        Show
        Konstantin Shvachko added a comment - Ravi, can you just use taskID as seed instead of passing the sequence number through DummyInputFormat. That way you will have different seeds per map but still completely reproducible because the number of maps is the same. Otherwise DummyInputFormat becomes not "dummy" and EmptySplit not "empty" anymore.
        Hide
        Ravi Prakash added a comment -

        Thanks for your review and suggestion Konstantin! I've updated the patch to use the taskID to seed the RNG.

        Show
        Ravi Prakash added a comment - Thanks for your review and suggestion Konstantin! I've updated the patch to use the taskID to seed the RNG.
        Hide
        Ravi Prakash added a comment -

        Upgrading patch to allow branch-1 / 0.20 to run the same jar / code for running Slive

        Show
        Ravi Prakash added a comment - Upgrading patch to allow branch-1 / 0.20 to run the same jar / code for running Slive
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12546363/MAPREDUCE-4645.branch-0.23.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2871//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2871//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546363/MAPREDUCE-4645.branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2871//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2871//console This message is automatically generated.
        Hide
        Konstantin Shvachko added a comment -

        +1 Looks good.

        Show
        Konstantin Shvachko added a comment - +1 Looks good.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2825 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2825/)
        MAPREDUCE-4645. Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568)

        Result = SUCCESS
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2825 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2825/ ) MAPREDUCE-4645 . Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2762 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2762/)
        MAPREDUCE-4645. Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568)

        Result = SUCCESS
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2762 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2762/ ) MAPREDUCE-4645 . Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2784 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2784/)
        MAPREDUCE-4645. Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568)

        Result = FAILURE
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2784 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2784/ ) MAPREDUCE-4645 . Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568) Result = FAILURE shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2826 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2826/)
        Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576)

        Result = SUCCESS
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2826 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2826/ ) Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2763 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2763/)
        Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576)

        Result = SUCCESS
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2763 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2763/ ) Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        Hide
        Konstantin Shvachko added a comment -

        I just committed it to trunk, branch 2, and branch 0.23.
        Thank you Ravi.

        Show
        Konstantin Shvachko added a comment - I just committed it to trunk, branch 2, and branch 0.23. Thank you Ravi.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2785 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2785/)
        Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576)

        Result = FAILURE
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2785 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2785/ ) Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576) Result = FAILURE shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #385 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/385/)
        MAPREDUCE-4645. Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389602)

        Result = UNSTABLE
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389602
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #385 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/385/ ) MAPREDUCE-4645 . Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389602) Result = UNSTABLE shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389602 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1176 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1176/)
        Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576)
        MAPREDUCE-4645. Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568)

        Result = SUCCESS
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1176 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1176/ ) Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576) MAPREDUCE-4645 . Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1207 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1207/)
        Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576)
        MAPREDUCE-4645. Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568)

        Result = SUCCESS
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1207 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1207/ ) Move MAPREDUCE-4645 under 0.23.4 release section in CHANGES.txt (Revision 1389576) MAPREDUCE-4645 . Provide a random seed to Slive to make the sequence of file names deterministic. Contributed by Ravi Prakash. (Revision 1389568) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389576 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1389568 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/Operation.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive/SliveMapper.java
        Hide
        Ravi Prakash added a comment -

        Thanks a lot Konstantin!

        Show
        Ravi Prakash added a comment - Thanks a lot Konstantin!

          People

          • Assignee:
            Ravi Prakash
            Reporter:
            Ravi Prakash
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development