[NIFI-399] Rename EvaluateRegularExpression to ExtractText and optimize - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.1.0
Component/s: Extensions
Labels:
- deprecation

Description

The processor EvaluateRegularExpression enables some cool extraction of text from data. It currently limits matching results to a single matching result. It should be updated to allow multiple capture groups per matching term. It can keep the current behavior. But can also add inclusion of all matching groups 0..n as an index appended to the basename of the attribute.

In addition the name of this processor (and possibly its tags) needs to be updated. The processor is used to extract text from a given document. The name should be 'ExtractText'. We can deprecate the old processor in 0.1.0 and in 0.2.0 pull it out.

In addition this processor should:

Precompile all patterns when the processor is scheduled to run.
Create memory buffers that do not exceed the minimum of flow file content or max buffer size specified
Support more than 1 capturing groups. The default behavior of storing capture group 1 at the given name is good. But there is also benefit to supporting multiple capture groups in a single execution.
Allow the user to specify the maximum length of a capturing group value

This also prompts the need for a StandardValidator which allows for creation of a validator that does a bounds check on a given DataSize.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

NIFI-399.patch
19/Mar/15 05:30
68 kB
Joe Witt

Sub-Tasks

Remove EvaluateRegularExpression (breaking change)

Resolved

Unassigned

Activity

People

Assignee:: Joe Witt

Reporter:: Joe Witt

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Mar/15 17:38

Updated:: 23/Mar/15 12:23

Resolved:: 19/Mar/15 14:54