Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-1228

Use profile based on file format in HCatalog integration(HiveRC, HiveText profiles)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.1.0.0-incubating
    • PXF
    • None

    Description

      To leverage changes introduced in HAWQ-1177, expand optimization for other Hive profiles. Additional information needs to be included in user metadata(e.g. DELIMITER etc).

      Changes needed:

      • Enhance the Metadata API, to add new attributes: outputFormats, outputParameters;
      • Hive, HiveORC profiles should support just GPDBWritable format;
      • HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
      • Unify HiveUserData data structures to be same among all Hive- profiles;
      • Bridge should read fragments using optimal profile read from fragment information;
      • Optimal profile should be determined based on file's input format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC, org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC, org.apache.hadoop.mapred.TextInputFormat - HiveText);
      • Default profile is Hive;
      • If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has some comlex types - Hive profile should be used(limitation should be addressed in HAWQ-1265);
      • If table is homogeneous(all input file have the same output format) Bridge uses the same format which table has. Otherwise, if table is heterogeneous, GPDBWritable should be used;
      • Add new property outputFormat to pxf-profiles-default.xml, which means default output format of profile.

      Attachments

        Issue Links

          Activity

            People

              odiachenko Oleksandr Diachenko
              odiachenko Oleksandr Diachenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: