Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
HIVE-708. Add TypedBytes SerDe for transform. (Namit Jain via zshao)
Description
Currently, LazySimpleSerDe is used to send data to the user transformation functions - it would be useful to let the user specify the format of the data.
Specifically, it would be very easy and useful to accommodate:
(cut and paste from Venky's mail)
Here's the typedbytes stuff that Dumbo uses.
http://issues.apache.org/jira/browse/HADOOP-1722
From:
http://static.last.fm/johan/huguk-20090414/klaas-hadoop-1722.pdf
Timings for IP count program on 300gigs of weblogs:
Java: 8 minutes
Dumbo with typed bytes: 10 minutes
Hive: 13 minutes
Dumbo without typed bytes: 16 minutes
They also have a fast python decoder for this, which is apparently 25% faster than the python version.
http://github.com/klbostee/ctypedbytes/tree/master
http://dumbotics.com/2009/05/31/dumbo-on-clouderas-distribution/
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-669 SELECT TRANSFORM / MAP / REDUCE to support optional ROW FORMAT
- Resolved
- is related to
-
HIVE-785 Add RecordWriter for ScriptOperator
- Closed
-
HADOOP-1722 Make streaming to handle non-utf8 byte array
- Closed
-
HIVE-786 Move ql/.../ql/util/typedbytes and ql/.../ql/exec/TypedBytesRecordReader.java to contrib
- Closed
- relates to
-
HIVE-751 Rename serde/serdeFormat etc in Hive.g for readability
- Closed