Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Create Puffin, which allows a programming model similar to the awk scripting language.
An awk program is a collection of rules, each of which is a pair: a predicate and an action. For each line in a file, the rules are applied in sequence, and if the predicate evaluates to true, the action is executed. Then awk goes on to the next file.
Here is a simple awk script that counts the number of non-comment lines in a file:
/^#/ {
++n;
}
END {
printf("counter: %d\n", n);
}
Here is the equivalent Puffin program:
Puffin.Program<Unit> program = Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger()) .add(line -> !line.startsWith("#"), line -> line.state().incrementAndGet()) .after(context -> context.println("counter: " + context.state().get())) .build();
In Puffin, each predicate is a Predicate<Line>>, and each action is a Consumer<Line>. Line is a data structure that gives access to the text of the line, regular expression matching, and file-local and global state.
Puffin allows thread-safe parallel processing of multiple files (or more generally sources, including URLs). File-local state is allocated by a factory, and each file is processed in a single thread. Therefore rules do not need to coordinate with rules processing other files.
Global state is also allocated by a factory, but it is shared, and rules must coordinate when they access it. In the above example, u -> new AtomicInteger() is the factory that creates global state.
Attachments
Issue Links
- is related to
-
CALCITE-5765 Add LintTest, to apply custom lint rules to source code
- Closed