Accumulo
  1. Accumulo
  2. ACCUMULO-532

Add BSP input/output formats to client package

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib
    • Labels:
      None

      Description

      I've just wrote basic BSP input formats and its unit tests.

      1. bsp.patch
        69 kB
        Edward J. Yoon

        Issue Links

          Activity

          Hide
          Edward J. Yoon added a comment -

          I can't attach the file here. It seems a problem of JIRA configuration.

          I've uploaded a patch on my server.
          Patch is available at http://udanax.org/bsp.patch

          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Summary:
          [INFO]
          [INFO] accumulo .......................................... SUCCESS [10.074s]
          [INFO] cloudtrace ........................................ SUCCESS [4.911s]
          [INFO] accumulo-start .................................... SUCCESS [20.856s]
          [INFO] accumulo-core ..................................... SUCCESS [53.895s]
          [INFO] accumulo-server ................................... SUCCESS [22.076s]
          [INFO] accumulo-examples ................................. SUCCESS [0.407s]
          [INFO] examples-simple ................................... SUCCESS [3.784s]
          [INFO] accumulo-wikisearch ............................... SUCCESS [0.032s]
          [INFO] wikisearch-ingest ................................. SUCCESS [18.361s]
          [INFO] wikisearch-query .................................. SUCCESS [13.170s]
          [INFO] wikisearch-query-war .............................. SUCCESS [10.451s]
          [INFO] accumulo-assemble ................................. SUCCESS [0.854s]
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 2:40.944s
          [INFO] Finished at: Fri Apr 13 16:31:04 KST 2012
          [INFO] Final Memory: 117M/278M
          [INFO] ------------------------------------------------------------------------
          
          Show
          Edward J. Yoon added a comment - I can't attach the file here. It seems a problem of JIRA configuration. I've uploaded a patch on my server. Patch is available at http://udanax.org/bsp.patch [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] accumulo .......................................... SUCCESS [10.074s] [INFO] cloudtrace ........................................ SUCCESS [4.911s] [INFO] accumulo-start .................................... SUCCESS [20.856s] [INFO] accumulo-core ..................................... SUCCESS [53.895s] [INFO] accumulo-server ................................... SUCCESS [22.076s] [INFO] accumulo-examples ................................. SUCCESS [0.407s] [INFO] examples-simple ................................... SUCCESS [3.784s] [INFO] accumulo-wikisearch ............................... SUCCESS [0.032s] [INFO] wikisearch-ingest ................................. SUCCESS [18.361s] [INFO] wikisearch-query .................................. SUCCESS [13.170s] [INFO] wikisearch-query-war .............................. SUCCESS [10.451s] [INFO] accumulo-assemble ................................. SUCCESS [0.854s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2:40.944s [INFO] Finished at: Fri Apr 13 16:31:04 KST 2012 [INFO] Final Memory: 117M/278M [INFO] ------------------------------------------------------------------------
          Hide
          Billie Rinaldi added a comment -

          Ed,

          I've added you to the Accumulo contributers list in JIRA and checked in a modified patch to contrib/trunk/bsp as a module of a new accumulo-contrib project.

          I modified the input and output formats to have the following form because I didn't want to have so much duplicate code. This way, if the MR i/o code is changed, the BSP i/o formats will benefit from it directly.

          public class AccumuloInputFormat extends org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat implements org.apache.hama.bsp.InputFormat<Key,Value>
          
          public class AccumuloOutputFormat extends org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat implements org.apache.hama.bsp.OutputFormat<Text,Mutation>
          

          Let me know if you see any issues with this. It could probably use some more testing. I was able to get the unit tests working (even the part commented out in the patch) but I had to set fake input and output paths. It seems that BSP doesn't initialize the RecordReader and RecordWriter unless the configuration options "bsp.input.dir" and "bsp.output.dir" are set.

          bspjob.setInputPath(new Path("test"));
          bspjob.setOutputPath(new Path("test"));
          
          Show
          Billie Rinaldi added a comment - Ed, I've added you to the Accumulo contributers list in JIRA and checked in a modified patch to contrib/trunk/bsp as a module of a new accumulo-contrib project. I modified the input and output formats to have the following form because I didn't want to have so much duplicate code. This way, if the MR i/o code is changed, the BSP i/o formats will benefit from it directly. public class AccumuloInputFormat extends org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat implements org.apache.hama.bsp.InputFormat<Key,Value> public class AccumuloOutputFormat extends org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat implements org.apache.hama.bsp.OutputFormat<Text,Mutation> Let me know if you see any issues with this. It could probably use some more testing. I was able to get the unit tests working (even the part commented out in the patch) but I had to set fake input and output paths. It seems that BSP doesn't initialize the RecordReader and RecordWriter unless the configuration options "bsp.input.dir" and "bsp.output.dir" are set. bspjob.setInputPath(new Path("test")); bspjob.setOutputPath(new Path("test"));
          Hide
          Edward J. Yoon added a comment -

          +1 for reusing org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat, I don't see any problem.

          I've fixed initialization problem of record reader/writer objects - HAMA-562 - for the next version.

          Show
          Edward J. Yoon added a comment - +1 for reusing org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat, I don't see any problem. I've fixed initialization problem of record reader/writer objects - HAMA-562 - for the next version.
          Hide
          Keith Turner added a comment -

          Can this be closed or resolved?

          Show
          Keith Turner added a comment - Can this be closed or resolved?
          Hide
          Edward J. Yoon added a comment -

          HAMA is about to graduate to TLP status. Please wait until we graduate and release new version. Then, I'll resolve this issues (in/out formats, BSP example, and documentation).

          Show
          Edward J. Yoon added a comment - HAMA is about to graduate to TLP status. Please wait until we graduate and release new version. Then, I'll resolve this issues (in/out formats, BSP example, and documentation).
          Hide
          Billie Rinaldi added a comment -

          I restructured the contrib so the modules can be versioned separately. The new location is https://svn.apache.org/repos/asf/accumulo/contrib/bsp/trunk/.

          Show
          Billie Rinaldi added a comment - I restructured the contrib so the modules can be versioned separately. The new location is https://svn.apache.org/repos/asf/accumulo/contrib/bsp/trunk/ .
          Hide
          Christopher Tubbs added a comment -

          I'm looking at this a bit closer, and was going to update this contrib to use Hama 0.6.0 release (latest release as of this writing) in the process of implementing ACCUMULO-769. However, I noticed a problem. Is there a reason BSPJob does not extend Job, and BSPJobContext does not extend JobContext? It seems it would be very convenient if it did.

          Show
          Christopher Tubbs added a comment - I'm looking at this a bit closer, and was going to update this contrib to use Hama 0.6.0 release (latest release as of this writing) in the process of implementing ACCUMULO-769 . However, I noticed a problem. Is there a reason BSPJob does not extend Job, and BSPJobContext does not extend JobContext? It seems it would be very convenient if it did.
          Hide
          Edward J. Yoon added a comment -

          >> It seems it would be very convenient if it did.

          Thanks for your comment. We'll check.
          And, sorry for my slow progress, we (Hama team) are still under heavy development.

          Show
          Edward J. Yoon added a comment - >> It seems it would be very convenient if it did. Thanks for your comment. We'll check. And, sorry for my slow progress, we (Hama team) are still under heavy development.
          Hide
          Edward J. Yoon added a comment -

          Oh.. do you mean Hadoop Job and JobContext? Hama BSP has totally different Job interface.

          Show
          Edward J. Yoon added a comment - Oh.. do you mean Hadoop Job and JobContext? Hama BSP has totally different Job interface.
          Hide
          John Vines added a comment -

          Christopher Tubbs Do you have any follow up on this patch since you were the one with questions about it?

          Show
          John Vines added a comment - Christopher Tubbs Do you have any follow up on this patch since you were the one with questions about it?
          Hide
          Eric Newton added a comment -

          Removing the 1.5.0 fix version since this will not be released with 1.5.0 (it is versioned separately in contrib)

          Show
          Eric Newton added a comment - Removing the 1.5.0 fix version since this will not be released with 1.5.0 (it is versioned separately in contrib)
          Hide
          Christopher Tubbs added a comment -

          No further questions, but with some of the changes to 1.5.0, I'm not sure this patch currently compiles against 1.5. There's also been some helpful changes to support both Hadoop mapred and mapreduce APIs, that will be useful to this contribution (especially the configuration utilities). It's up to Edward J. Yoon if he wants to leave this ticket open or close, and create another ticket to improve this add-on.

          Show
          Christopher Tubbs added a comment - No further questions, but with some of the changes to 1.5.0, I'm not sure this patch currently compiles against 1.5. There's also been some helpful changes to support both Hadoop mapred and mapreduce APIs, that will be useful to this contribution (especially the configuration utilities). It's up to Edward J. Yoon if he wants to leave this ticket open or close, and create another ticket to improve this add-on.
          Hide
          Edward J. Yoon added a comment -

          Let's close this issue at the moment. Once Hama BSP and Apache MRQL[1] are ready to use, I'll try to improve the BSP computing on top of accumulo.

          1. http://wiki.apache.org/incubator/MRQLProposal

          Show
          Edward J. Yoon added a comment - Let's close this issue at the moment. Once Hama BSP and Apache MRQL [1] are ready to use, I'll try to improve the BSP computing on top of accumulo. 1. http://wiki.apache.org/incubator/MRQLProposal

            People

            • Assignee:
              Edward J. Yoon
              Reporter:
              Edward J. Yoon
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development