Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-708

Add TypedBytes SerDe for transform

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      HIVE-708. Add TypedBytes SerDe for transform. (Namit Jain via zshao)

      Description

      Currently, LazySimpleSerDe is used to send data to the user transformation functions - it would be useful to let the user specify the format of the data.

      Specifically, it would be very easy and useful to accommodate:

      (cut and paste from Venky's mail)

      Here's the typedbytes stuff that Dumbo uses.

      http://issues.apache.org/jira/browse/HADOOP-1722

      From:
      http://static.last.fm/johan/huguk-20090414/klaas-hadoop-1722.pdf
      Timings for IP count program on 300gigs of weblogs:
      Java: 8 minutes
      Dumbo with typed bytes: 10 minutes
      Hive: 13 minutes
      Dumbo without typed bytes: 16 minutes

      They also have a fast python decoder for this, which is apparently 25% faster than the python version.
      http://github.com/klbostee/ctypedbytes/tree/master

      http://dumbotics.com/2009/05/31/dumbo-on-clouderas-distribution/

        Attachments

        1. hive.708.4.patch
          215 kB
          Namit Jain
        2. hive.708.3.patch
          251 kB
          Namit Jain
        3. hive.708.2.patch
          104 kB
          Namit Jain
        4. hive.708.1.patch
          111 kB
          Namit Jain

          Issue Links

            Activity

              People

              • Assignee:
                namit Namit Jain
                Reporter:
                namit Namit Jain
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: