Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-3031

HiveQL processor improvements (Multi-Statement Scripts in PutHiveQL, CSV options in SelectHiveQL)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.2.0
    • None
    • None

    Description

      Trying to use the PutHiveQL processor to execute a HiveQL script that contains multiple statements.

      IE:

      USE my_database;

      FROM my_database_src.base_table
      INSERT OVERWRITE refined_table
      SELECT *;

      – or –

      use my_database;

      create temporary table WORKING as
      select a,b,c from RAW;

      FROM RAW
      INSERT OVERWRITE refined_table
      SELECT *;

      The current implementation doesn't even like it when you have a semicolon at the end of the single statement.

      Either use a default delimiter like a semi-colon to mark the boundaries of a statement within the file or allow them to define there own.

      This enables the building of pipelines that are testable by not embedding HiveQL into a product; rather sourcing them from files. And the scripts can be complex. Each statement should run in a linear manner and be part of the same JDBC session to ensure things like "temporary" tables will work.

      Also, since SelectHiveQL offers CSV as an output format, an improvement would be to include properties (with existing defaults) for things like "Include Header in Output", "Alternate CSV Header", "CSV Delimiter", "Quote CSV" and "Escape CSV"

      Attachments

        Activity

          People

            mattyb149 Matt Burgess
            mattyb149 Matt Burgess
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: