[HIVE-51] Generate and accept JSON as the input-output format from mappers and reducers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Query Processor
Labels:
None

Description

set mapred.data.format=JSON;
....
MAP USING 'python filter.py'
....;

would mean that filter.py would receive a JSON formatted dictionary of the columns instead of a tab-delimited string.

{ column1: value1, column2: [1,2,3] }

etc

It would in turn produce JSON.

This should be done so that the JSON is not transmitted back and forth over the network; it would be generated on the fly on the mapper node, and converted back to the standard format used (tab-delimited, I assume).

This seems like the simplest way for encoding type information in the input to mappers; it would also remove the need for silly boilerplate code that took a list of expected input column names, took the input stream, split it up, and made a dictionary of

{column name: value}

on every record.

Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm not sure if that is doable.

Attachments

Issue Links

is blocked by

HIVE-163 support loading json data into hive

Closed

is related to

HIVE-658 Allow conversions from string to complex types

Open

HIVE-669 SELECT TRANSFORM / MAP / REDUCE to support optional ROW FORMAT

Resolved

relates to

HIVE-348 Provide type information to custom mappers and reducers.

Open

Activity

People

Assignee:: Unassigned

Reporter:: Venky Iyer

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 05/Nov/08 09:41

Updated:: 12/Jan/10 22:41