Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
To leverage changes introduced in HAWQ-1177, expand optimization for other Hive profiles. Additional information needs to be included in user metadata(e.g. DELIMITER etc).
Changes needed:
- Enhance the Metadata API, to add new attributes: outputFormats, outputParameters;
- Hive, HiveORC profiles should support just GPDBWritable format;
- HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
- Unify HiveUserData data structures to be same among all Hive- profiles;
- Bridge should read fragments using optimal profile read from fragment information;
- Optimal profile should be determined based on file's input format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC, org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC, org.apache.hadoop.mapred.TextInputFormat - HiveText);
- Default profile is Hive;
- If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has some comlex types - Hive profile should be used(limitation should be addressed in HAWQ-1265);
- If table is homogeneous(all input file have the same output format) Bridge uses the same format which table has. Otherwise, if table is heterogeneous, GPDBWritable should be used;
- Add new property outputFormat to pxf-profiles-default.xml, which means default output format of profile.
Attachments
Issue Links
- links to