Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
set mapred.data.format=JSON;
....
MAP USING 'python filter.py'
....;
would mean that filter.py would receive a JSON formatted dictionary of the columns instead of a tab-delimited string.
{ column1: value1, column2: [1,2,3] }etc
It would in turn produce JSON.
This should be done so that the JSON is not transmitted back and forth over the network; it would be generated on the fly on the mapper node, and converted back to the standard format used (tab-delimited, I assume).
This seems like the simplest way for encoding type information in the input to mappers; it would also remove the need for silly boilerplate code that took a list of expected input column names, took the input stream, split it up, and made a dictionary of
{column name: value}on every record.
Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm not sure if that is doable.
Attachments
Issue Links
- is blocked by
-
HIVE-163 support loading json data into hive
- Closed
- is related to
-
HIVE-658 Allow conversions from string to complex types
- Open
-
HIVE-669 SELECT TRANSFORM / MAP / REDUCE to support optional ROW FORMAT
- Resolved
- relates to
-
HIVE-348 Provide type information to custom mappers and reducers.
- Open