Hive
  1. Hive
  2. HIVE-1015

Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Contrib
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      TRANSFORM MAP REDUCE java

      Description

      Larry Ogrodnek has written a set of wrapper classes that make it possible
      to write Hive TRANSFORM/MAP/REDUCE scripts in Java in a style that
      more closely resembles conventional Hadoop MR programs.

      A blog post describing this library can be found here: http://dev.bizo.com/2009/10/hive-map-reduce-in-java.html

      The source code (with Apache license) is available here: http://github.com/ogrodnek/shmrj

      We should add this to contrib.

      1. HIVE-1015.patch
        26 kB
        Larry Ogrodnek
      2. HIVE-1015.patch
        16 kB
        Larry Ogrodnek

        Activity

        Hide
        Edward Capriolo added a comment -

        JAVA, JAVA, JAVA. I love it. Even our 'external scripts' can be java now

        Show
        Edward Capriolo added a comment - JAVA, JAVA, JAVA. I love it. Even our 'external scripts' can be java now
        Hide
        Namit Jain added a comment -

        Committed. Thanks Larry

        Show
        Namit Jain added a comment - Committed. Thanks Larry
        Hide
        Namit Jain added a comment -

        +1

        looks good - will commit if the tests pass

        Show
        Namit Jain added a comment - +1 looks good - will commit if the tests pass
        Hide
        Larry Ogrodnek added a comment -

        Here's a new patch with a .q file using an example mapper and reducer.

        I also removed the dependency of these classes on apache commons lang, since there was only a single use of StringUtils.join(), and it's one less thing to specify on the classpath in the USING clause....

        Thanks.

        Show
        Larry Ogrodnek added a comment - Here's a new patch with a .q file using an example mapper and reducer. I also removed the dependency of these classes on apache commons lang, since there was only a single use of StringUtils.join(), and it's one less thing to specify on the classpath in the USING clause.... Thanks.
        Hide
        Todd Lipcon added a comment -

        I think it's very slightly different:

        • UDF - only a 1:1 mapping on a single column
        • UDAF - requires implementation of Combiner-like functionality, best I can tell (haven't delved into this deeply, so apologies if you can do a reducer-only UDAF)
        • UDTF - perhaps supports the same functionality, but the syntax is a little less obvious than the MAP/REDUCE syntax. I think this feature could be implemented by an AST transform and some kind of interface-changing wrapper class for UDTF that makes it look more like the usual MR API.

        BTW, these thoughts definitely shouldn't block progress on this JIRA. I just wanted to throw the idea out there.

        Show
        Todd Lipcon added a comment - I think it's very slightly different: UDF - only a 1:1 mapping on a single column UDAF - requires implementation of Combiner-like functionality, best I can tell (haven't delved into this deeply, so apologies if you can do a reducer-only UDAF) UDTF - perhaps supports the same functionality, but the syntax is a little less obvious than the MAP/REDUCE syntax. I think this feature could be implemented by an AST transform and some kind of interface-changing wrapper class for UDTF that makes it look more like the usual MR API. BTW, these thoughts definitely shouldn't block progress on this JIRA. I just wanted to throw the idea out there.
        Hide
        Namit Jain added a comment -

        Isn't that same as UDF/UDAF/UDTF ?

        Show
        Namit Jain added a comment - Isn't that same as UDF/UDAF/UDTF ?
        Hide
        Todd Lipcon added a comment -

        Related thought: would be nice to be able to write MAP/REDUCE as straight Java without having the overhead of streaming and serde. Is there a ticket already for this?

        Show
        Todd Lipcon added a comment - Related thought: would be nice to be able to write MAP/REDUCE as straight Java without having the overhead of streaming and serde. Is there a ticket already for this?
        Hide
        Namit Jain added a comment -

        Larry, can you also add a test .q file which uses the new wrapper classes in the map/reduce script.
        We can definitely get it in 0.5

        Show
        Namit Jain added a comment - Larry, can you also add a test .q file which uses the new wrapper classes in the map/reduce script. We can definitely get it in 0.5
        Hide
        Carl Steinbach added a comment -

        +1

        It would be nice to have this in 0.5.0

        Show
        Carl Steinbach added a comment - +1 It would be nice to have this in 0.5.0
        Hide
        Larry Ogrodnek added a comment -

        Attached is a patch against the HIVE trunk adding the code that Carl describes.

        Included is also a small change to common-build.xml to exclude inner classes from being treated as test cases themselves.

        Show
        Larry Ogrodnek added a comment - Attached is a patch against the HIVE trunk adding the code that Carl describes. Included is also a small change to common-build.xml to exclude inner classes from being treated as test cases themselves.

          People

          • Assignee:
            Larry Ogrodnek
            Reporter:
            Carl Steinbach
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development