Uploaded image for project: 'Falcon'
  1. Falcon
  2. FALCON-997

Injecting the $falcon_output_path variable into Falcon process

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.6, 0.7, trunk
    • None
    • feed, process
    • None

    Description

      Always when possible, I try to use Falcon with HCatalog. Falcon already injects several useful variables like falcon_output_database, falcon_output_table into a process that let you parametrize your script.

      In some use-cases, however, even if you use feeds backed by Hive tables, having a path to your dataset that you want to create is useful e.g.

      • you run a Camus job to move fresh logs from Kafka to HDFS.

      Once Camus finishes, you would like to create Hive partition on top of the newly-created directory. Later this directory becomes an input to ETL processes managed by Falcon, so you have to have a Hive table on top of it. Therefore, you need to know the Hive table and the exact path to the partition.

      • you want to remove an existing dataset, before regenerating it to prevent from data duplication and make the operation idempotent

      e.g. some versions of Pig and HCatalog append to the existing dataset, if they the script is re-run https://issues.apache.org/jira/browse/HIVE-8371. If you just drop the partition of the external table, the partition is removed, but the data in HDFS still exists.

      Injecting the variable like falcon_output_path into the Falcon process could help here. The falcon_output_path could be taken directly from a Hive metastore (if the partition is already created), or constructed in some predefined way (if the partition isn't created yet).

      Attachments

        Activity

          People

            Unassigned Unassigned
            kawaa Adam Kawa
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: