Uploaded image for project: 'Hama'
  1. Hama
  2. HAMA-423

Improve and Refactor Partitioning in the Examples

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: examples
    • Labels:
      None

      Description

      Currently partitioning will write a key/value pair for each vertex/adjacent mapping.
      This results in heavy IO writes which actually bloats the file and let the partitioning take unnecessarily long.

      We should partition directly into the vertex classes and implement a vertex list/array writable which just writes a single key/value pair for a vertex/all-adjacents mapping.

      In fact we should make it generic, passing a vertex class which should implement the Writable interface.

      1. HAMA-423-v1.patch
        54 kB
        Thomas Jungblut
      2. HAMA-423-withoutCRs.patch
        54 kB
        Thomas Jungblut
      3. sickimprovement.PNG
        10 kB
        Thomas Jungblut

        Activity

        Hide
        thomas.jungblut Thomas Jungblut added a comment -

        I really did a lot of stuff here.
        But partitioning will now take about 1 minute for our example files.

        I'm going to extend the wiki. Currently I am uploading the new .txt example files to trunk.

        Show
        thomas.jungblut Thomas Jungblut added a comment - I really did a lot of stuff here. But partitioning will now take about 1 minute for our example files. I'm going to extend the wiki. Currently I am uploading the new .txt example files to trunk.
        Hide
        thomas.jungblut Thomas Jungblut added a comment -

        11/08/20 01:58:47 INFO graph.ShortestPaths: Starting data partitioning...
        11/08/20 01:59:37 INFO graph.ShortestPaths: Finished!

        for 2.000.000 vertices. Sounds nice ;D

        Show
        thomas.jungblut Thomas Jungblut added a comment - 11/08/20 01:58:47 INFO graph.ShortestPaths: Starting data partitioning... 11/08/20 01:59:37 INFO graph.ShortestPaths: Finished! for 2.000.000 vertices. Sounds nice ;D
        Hide
        thomas.jungblut Thomas Jungblut added a comment - - edited

        Once comitted we have to rewrite http://wiki.apache.org/hama/SSSP, it is now a textfile(can be found here: http://hama-shortest-paths.googlecode.com/svn/trunk/hama-gsoc/files/cities-adjacencylist/sssp-adjacencylist.txt), for the people who want to submit their own graph and for them who download a large sequencefile.

        Later on we can extend AbstractGraphPartitioner to work with some kind of inputformat,recordreader or compression codec. So it is one step to HAMA-258.
        We can use this in the Pregel API, too since this is class based via configuration:

        (Class<T>) conf.getClass("hama.partitioning.vertex.class",
                Vertex.class);
        

        Would someone please review this with a sample file?

        Show
        thomas.jungblut Thomas Jungblut added a comment - - edited Once comitted we have to rewrite http://wiki.apache.org/hama/SSSP , it is now a textfile(can be found here: http://hama-shortest-paths.googlecode.com/svn/trunk/hama-gsoc/files/cities-adjacencylist/sssp-adjacencylist.txt ), for the people who want to submit their own graph and for them who download a large sequencefile. Later on we can extend AbstractGraphPartitioner to work with some kind of inputformat,recordreader or compression codec. So it is one step to HAMA-258 . We can use this in the Pregel API, too since this is class based via configuration: (Class<T>) conf.getClass("hama.partitioning.vertex.class", Vertex.class); Would someone please review this with a sample file?
        Hide
        udanax Edward J. Yoon added a comment -

        Good job!

        Show
        udanax Edward J. Yoon added a comment - Good job!
        Hide
        udanax Edward J. Yoon added a comment -

        Minor comment here,

        Your patch always contains trailing CRs. Please remove them and See also HAMA-416.

        Show
        udanax Edward J. Yoon added a comment - Minor comment here, Your patch always contains trailing CRs. Please remove them and See also HAMA-416 .
        Hide
        udanax Edward J. Yoon added a comment -

        Here is my console results with new patch and textfile on physical 16 nodes cluster. Works well.

        root@hnode1:/usr/local/src/hama-trunk/core# bin/hama jar ../examples/target/hama-examples-0.4.0-incubating-SNAPSHOT.jar sssp Umanap edward/sssp-output /user/root/edward/sssp-adjacencylist.txt
        Single Source Shortest Path Example:
        <Startvertex name> <optional: output path> <optional: path to own adjacency list textfile!>
        Setting default start vertex to "Frankfurt"!
        Setting start vertex to Umanap!
        Using new output folder: edward/sssp-output
        11/08/22 11:00:15 INFO graph.ShortestPaths: Starting data partitioning...
        11/08/22 11:01:03 INFO graph.ShortestPaths: Finished!
        11/08/22 11:01:04 INFO bsp.BSPJobClient: Running job: job_201108221035_0004
        11/08/22 11:01:07 INFO bsp.BSPJobClient: Current supersteps number: 0
        11/08/22 11:01:13 INFO bsp.BSPJobClient: Current supersteps number: 2
        11/08/22 11:01:16 INFO bsp.BSPJobClient: Current supersteps number: 10
        11/08/22 11:01:19 INFO bsp.BSPJobClient: Current supersteps number: 14
        11/08/22 11:01:22 INFO bsp.BSPJobClient: Current supersteps number: 18
        11/08/22 11:01:28 INFO bsp.BSPJobClient: Current supersteps number: 20
        11/08/22 11:01:31 INFO bsp.BSPJobClient: Current supersteps number: 21
        11/08/22 11:01:40 INFO bsp.BSPJobClient: Current supersteps number: 23
        11/08/22 11:01:43 INFO bsp.BSPJobClient: Current supersteps number: 24
        11/08/22 11:01:46 INFO bsp.BSPJobClient: Current supersteps number: 27
        11/08/22 11:01:52 INFO bsp.BSPJobClient: Current supersteps number: 30
        11/08/22 11:01:58 INFO bsp.BSPJobClient: Current supersteps number: 33
        11/08/22 11:02:01 INFO bsp.BSPJobClient: Current supersteps number: 36
        11/08/22 11:02:04 INFO bsp.BSPJobClient: Current supersteps number: 39
        11/08/22 11:02:07 INFO bsp.BSPJobClient: Current supersteps number: 42
        11/08/22 11:02:10 INFO bsp.BSPJobClient: Current supersteps number: 47
        11/08/22 11:02:13 INFO bsp.BSPJobClient: Current supersteps number: 50
        11/08/22 11:02:16 INFO bsp.BSPJobClient: Current supersteps number: 57
        11/08/22 11:02:19 INFO bsp.BSPJobClient: Current supersteps number: 60
        11/08/22 11:02:22 INFO bsp.BSPJobClient: Current supersteps number: 68
        11/08/22 11:02:25 INFO bsp.BSPJobClient: Current supersteps number: 72
        11/08/22 11:02:28 INFO bsp.BSPJobClient: Current supersteps number: 81
        11/08/22 11:02:31 INFO bsp.BSPJobClient: Current supersteps number: 85
        11/08/22 11:02:34 INFO bsp.BSPJobClient: Current supersteps number: 93
        11/08/22 11:02:37 INFO bsp.BSPJobClient: Current supersteps number: 97
        11/08/22 11:02:40 INFO bsp.BSPJobClient: Current supersteps number: 102
        11/08/22 11:02:43 INFO bsp.BSPJobClient: The total number of supersteps: 102
        Job Finished in 99.684 seconds
        -------------------- RESULTS --------------------
        11/08/22 11:02:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        11/08/22 11:02:43 INFO compress.CodecPool: Got brand-new decompressor
        Chan-Santa Cruz | 63422
        Samiene | 66036
        Pimental | 78866
        Chaksom | 84903
        Sachiyama | 73654
        Itero de la Vega | 67042
        ....
        

        BTW, should we print all results?

        Show
        udanax Edward J. Yoon added a comment - Here is my console results with new patch and textfile on physical 16 nodes cluster. Works well. root@hnode1:/usr/local/src/hama-trunk/core# bin/hama jar ../examples/target/hama-examples-0.4.0-incubating-SNAPSHOT.jar sssp Umanap edward/sssp-output /user/root/edward/sssp-adjacencylist.txt Single Source Shortest Path Example: <Startvertex name> <optional: output path> <optional: path to own adjacency list textfile!> Setting default start vertex to "Frankfurt" ! Setting start vertex to Umanap! Using new output folder: edward/sssp-output 11/08/22 11:00:15 INFO graph.ShortestPaths: Starting data partitioning... 11/08/22 11:01:03 INFO graph.ShortestPaths: Finished! 11/08/22 11:01:04 INFO bsp.BSPJobClient: Running job: job_201108221035_0004 11/08/22 11:01:07 INFO bsp.BSPJobClient: Current supersteps number: 0 11/08/22 11:01:13 INFO bsp.BSPJobClient: Current supersteps number: 2 11/08/22 11:01:16 INFO bsp.BSPJobClient: Current supersteps number: 10 11/08/22 11:01:19 INFO bsp.BSPJobClient: Current supersteps number: 14 11/08/22 11:01:22 INFO bsp.BSPJobClient: Current supersteps number: 18 11/08/22 11:01:28 INFO bsp.BSPJobClient: Current supersteps number: 20 11/08/22 11:01:31 INFO bsp.BSPJobClient: Current supersteps number: 21 11/08/22 11:01:40 INFO bsp.BSPJobClient: Current supersteps number: 23 11/08/22 11:01:43 INFO bsp.BSPJobClient: Current supersteps number: 24 11/08/22 11:01:46 INFO bsp.BSPJobClient: Current supersteps number: 27 11/08/22 11:01:52 INFO bsp.BSPJobClient: Current supersteps number: 30 11/08/22 11:01:58 INFO bsp.BSPJobClient: Current supersteps number: 33 11/08/22 11:02:01 INFO bsp.BSPJobClient: Current supersteps number: 36 11/08/22 11:02:04 INFO bsp.BSPJobClient: Current supersteps number: 39 11/08/22 11:02:07 INFO bsp.BSPJobClient: Current supersteps number: 42 11/08/22 11:02:10 INFO bsp.BSPJobClient: Current supersteps number: 47 11/08/22 11:02:13 INFO bsp.BSPJobClient: Current supersteps number: 50 11/08/22 11:02:16 INFO bsp.BSPJobClient: Current supersteps number: 57 11/08/22 11:02:19 INFO bsp.BSPJobClient: Current supersteps number: 60 11/08/22 11:02:22 INFO bsp.BSPJobClient: Current supersteps number: 68 11/08/22 11:02:25 INFO bsp.BSPJobClient: Current supersteps number: 72 11/08/22 11:02:28 INFO bsp.BSPJobClient: Current supersteps number: 81 11/08/22 11:02:31 INFO bsp.BSPJobClient: Current supersteps number: 85 11/08/22 11:02:34 INFO bsp.BSPJobClient: Current supersteps number: 93 11/08/22 11:02:37 INFO bsp.BSPJobClient: Current supersteps number: 97 11/08/22 11:02:40 INFO bsp.BSPJobClient: Current supersteps number: 102 11/08/22 11:02:43 INFO bsp.BSPJobClient: The total number of supersteps: 102 Job Finished in 99.684 seconds -------------------- RESULTS -------------------- 11/08/22 11:02:43 WARN util.NativeCodeLoader: Unable to load native -hadoop library for your platform... using builtin-java classes where applicable 11/08/22 11:02:43 INFO compress.CodecPool: Got brand- new decompressor Chan-Santa Cruz | 63422 Samiene | 66036 Pimental | 78866 Chaksom | 84903 Sachiyama | 73654 Itero de la Vega | 67042 .... BTW, should we print all results?
        Hide
        thomas.jungblut Thomas Jungblut added a comment -

        Minor comment here,

        Your patch always contains trailing CRs. Please remove them and See also HAMA-416.

        Yeah, I strip them this evening.
        Thanks for your tests.

        What result should we print instead?

        Show
        thomas.jungblut Thomas Jungblut added a comment - Minor comment here, Your patch always contains trailing CRs. Please remove them and See also HAMA-416 . Yeah, I strip them this evening. Thanks for your tests. What result should we print instead?
        Hide
        thomas.jungblut Thomas Jungblut added a comment -

        Made it with:

        tr -d '\r' < HAMA-423-v1.patch > ../Desktop/HAMA-423-withoutCRs.patch

        I'm so sorry, because I've coded it on windows.

        The example executes faster than the one before, not just the partitioning. Seems to be a "good graph"

        Show
        thomas.jungblut Thomas Jungblut added a comment - Made it with: tr -d '\r' < HAMA-423 -v1.patch > ../Desktop/ HAMA-423 -withoutCRs.patch I'm so sorry, because I've coded it on windows. The example executes faster than the one before, not just the partitioning. Seems to be a "good graph"
        Hide
        udanax Edward J. Yoon added a comment -

        Thanks Thomas, I just committed this!

        Show
        udanax Edward J. Yoon added a comment - Thanks Thomas, I just committed this!

          People

          • Assignee:
            thomas.jungblut Thomas Jungblut
            Reporter:
            thomas.jungblut Thomas Jungblut
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development