Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently PutHiveStreaming (PHS) can only support a single task at a time. Before NIFI-4342, that meant each target table would need its own PHS instance, which can be cumbersome with large numbers of tables. After NIFI-4342, Expression Language could be used for SDLC purposes (database/table changes between development and production, e.g.).
However it would be nice to be able to support at least database/table names using flow file attributes, and also to support multiple tasks to handle them concurrently. Due to the nature of PHS and the Streaming Ingest APIs (and implementation), it is likely not prudent to allow two tasks to write to the same table and partition at the same time.
I propose adding flow file attribute EL evaluation where prudent, and allowing per-table concurrency in PHS. A thread will attempt to get a lock on a table, and if it cannot, will rollback and return.
Attachments
Issue Links
- links to