[HIVE-17181] HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0, 3.0.0
Fix Version/s: 2.4.0, 3.0.0
Component/s: HCatalog
Labels:
None

Target Version/s:

2.4.0, 3.0.0

Description

Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic partitioning are expected to call the following API methods:

HCatOutputFormat.setOutput() to indicate which table/partitions to write to. This call populates the OutputJobInfo with details fetched from the Metastore.
HCatOutputFormat.setSchema() to indicate the output-schema for the data being written.

It is a common mistake to invoke HCatOUtputFormat.setSchema() as follows:

HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));

Unfortunately, getTableSchema() returns only the record-schema, not the entire table's schema. We'll need a better API for use in M/R jobs to get the complete table-schema.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-17181.1.patch
27/Jul/17 04:28
2 kB
Mithun Radhakrishnan
HIVE-17181.1-branch-2.patch
15/Aug/17 23:39
6 kB
Mithun Radhakrishnan
HIVE-17181.2.patch
04/Aug/17 22:32
6 kB
Mithun Radhakrishnan
HIVE-17181.3.patch
07/Aug/17 18:04
6 kB
Mithun Radhakrishnan

Activity

People

Assignee:: Mithun Radhakrishnan

Reporter:: Mithun Radhakrishnan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Jul/17 04:23

Updated:: 22/May/18 23:58

Resolved:: 16/Aug/17 17:21