Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-51

Generate and accept JSON as the input-output format from mappers and reducers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      set mapred.data.format=JSON;
      ....
      MAP USING 'python filter.py'
      ....;

      would mean that filter.py would receive a JSON formatted dictionary of the columns instead of a tab-delimited string.

      { column1: value1, column2: [1,2,3] }

      etc

      It would in turn produce JSON.

      This should be done so that the JSON is not transmitted back and forth over the network; it would be generated on the fly on the mapper node, and converted back to the standard format used (tab-delimited, I assume).

      This seems like the simplest way for encoding type information in the input to mappers; it would also remove the need for silly boilerplate code that took a list of expected input column names, took the input stream, split it up, and made a dictionary of

      {column name: value}

      on every record.

      Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm not sure if that is doable.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              indigoviolet Venky Iyer
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: