Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8371

HCatStorer should fail by default when publishing to an existing partition

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.13.0, 0.14.0, 0.13.1
    • Fix Version/s: None
    • Component/s: HCatalog

      Description

      In Hive-12 and before (on in previous HCatalog releases) HCatStorer would fail if the partition already exists (whether before launching the job or during commit depending on the partitioning). HIVE-6406 changed that behavior and by default does an append. This causes data quality issues since an rerun (or duplicate run) won't fail (when it used to) and will just append to the partition.

      A preferable approach would be to leave HCatStorer behavior as is (fail during a duplicate publish) and support append through an option. Overwrite also can be implemented in a similar fashion. Eg:

      store A into 'db.table' using org.apache.hive.hcatalog.pig.HCatStorer('partspec', '', ' -append');

        Attachments

          Activity

            People

            • Assignee:
              thiruvel Thiruvel Thirumoolan
              Reporter:
              thiruvel Thiruvel Thirumoolan
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: